Apache Druid

Apache Druid® is a real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high uptime are important. Druid is commonly used as the database backend for GUIs of analytical applications, or for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data.

Common application areas for Druid include:

  • Clickstream analytics including web and mobile analytics

  • Network telemetry analytics including network performance monitoring

  • Server metrics storage

  • Supply chain analytics including manufacturing metrics

  • Application performance metrics

  • Digital marketing/advertising analytics

  • Business intelligence/OLAP

If you are interested in a direct Decodable Connector for Druid, please contact support@decodable.co or join our Slack community and let us know!

Overview

Connector name

druid

Type

sink

Delivery guarantee

exactly once

Getting Started

Sending a Decodable data stream to Druid is accomplished in two stages, first by creating a sink connector to a data source that is supported by Druid, and then by adding that data source to your Druid configuration. Decodable and Druid mutually support several technologies, including the following:

  • Amazon Kinesis

  • Apache Kafka

Configure As A Sink

This example demonstrates using Kafka as the sink from Decodable and the source for Druid. Sign in to Decodable Web and follow the configuration steps provided in the Apache Kafka to create a sink connector. For examples of using the command line tools or scripting, see the How To guides.

Create Kafka Data Source

To ingest event data, also known as message data, from Kafka into Druid, you must submit a supervisor spec. When you enable the Kafka indexing service, you can configure supervisors on the Overlord to manage the creation and lifetime of Kafka indexing tasks. Kafka indexing tasks read events using Kafka’s own partition and offset mechanism to guarantee exactly-once ingestion.

The Kafka indexing service supports transactional topics introduced in Kafka 0.11.x by default. The consumer for Kafka indexing service is incompatible with older Kafka brokers. If you are using an older version, refer to the Kafka upgrade guide. Additionally, you can set isolation.level to read_uncommitted in consumerProperties if either:

  • You don’t need Druid to consume transactional topics.

  • You need Druid to consume older versions of Kafka. Make sure offsets are sequential, since there is no offset gap check in Druid anymore.

If your Kafka cluster enables consumer-group based ACLs, you can set group.id in consumerProperties to override the default auto generated group id.

For more detailed information, please refer to Druid’s Kafka documentation.


Apache Kafka, Kafka®, Apache®, Druid®, and associated open source project names are either registered trademarks or trademarks of The Apache Software Foundation.