Druid Installation

Druid is a data store designed for high-performance slice-and-dice analytics (“OLAP“-style) on large data sets. Druid is most often used as a data store for powering GUI analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations. Common application areas for Druid include:

  • Clickstream analytics
  • Network flow analytics
  • Server metrics storage
  • Application performance metrics
  • Digital marketing analytics
  • Business intelligence / OLAP and more

Druid Three Node Cluster Setup

Here we are going to learn how to install druid in three node cluster based on your hardware requirement. We can set it up on a single machine too. We highly recommend to go with cluster setup for production as it is more efficient than single machine.

By the end of this blog,  you will be able to set up your own druid cluster to load the data.

Prerequisite

You will need:

  • Java 8
  • Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
  • On Mac OS X, you can use Oracle’s JDK 8 to install Java.

On Linux, your OS package manager will be able to help you install Java. If your Ubuntu- based OS does not have a recent version of Java, WebUpd8 offers packages for those Oses.

Download Druid :-

Download the latest version of druid

Extract Druid by running the following commands in your terminal

In the package, you should find:

  • bin/* – scripts useful for this quickstart
  • conf/* – template configurations for a clustered setup
  • extensions/* – core Druid extensions
  • hadoop-dependencies/* – Druid Hadoop dependencies
  • lib/* – libraries and dependencies for core Druid
  • quickstart/* – configuration files, sample data, and other files for the quickstart tutorials

Download Zookeeper :-

Druid has a dependency on Apache ZooKeeper  for distributed coordination. You’ll need to download and run Zookeeper.

In the package root, run the following commands:

The startup scripts for the tutorial will expect the contents of the Zookeeper tarball to be located at zk under the apache-druid-0.13.0-incubating package root

Select hardware for three nodes configuration

NodeServiceRAMVCPUsDisk(HDD/SSD)
1Coordinator and Overlord8GB4Min of 100GB
2Broker8GB4Min of 100GB
3Historicals and MiddleManagers8GB4Min of 100GB

Note :- Broker needs more memory for the query process, so we are leaving it as one single node. Higher the memory, faster the query.

If your using any data pushing service in any node (tranquility/kafka… etc), it can occupy more than 8GB/16GB.

Configure addresses for Druid coordination

In this simple cluster, you will deploy a single Druid Coordinator, a single Druid Overlord, a single ZooKeeper instance, and an embedded Derby metadata store on the same server.

In conf/druid/_common/common.runtime.properties, replace “zk.service.host” with the address of the machine that runs your ZK instance:

  • druid.zk.service.host

In conf/druid/_common/common.runtime.properties, replace “metadata.storage.*” with the address of the machine that you will use as your metadata store:

  • druid.metadata.storage.connector.connectURI
  • druid.metadata.storage.connector.host

Tune Druid Coordinator and Overlord

In <Druid/path/>/conf/druid/coordinator you will find two configuration jvm.conf and runtime.properties

JVM.CONF:

In jvm.conf change xms and xmx according to your systemc hardware configuration

The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM), while Xms specifies the initial memory allocation pool. This means that your JVM will be started with Xms amount of memory and will be able to use a maximum of Xmx amount of memory

Our hardware compatibility can take upto:

  • Xms3g
  • Xmx3g

RUNTIME.PROPERTIES :

In runtime properties you can leave as default. Only thing you need to take care is druid.service, druid.port and you can add druid.host

  • druid.service : druid/coordinator / druid/overlord
  • druid.host : 8081  / 8090
  • druid.host: coordinatorHost / overlordHost

Note : – It is recommended to add druid.host variable when you setup cluster, provide specific IP of node .Do not use any “localhost” keyword in druid.host. You can change own services, ports and host of coordinator and overlord.

Tune Druid Broker

Same as the coordinator and overlord of xms and xmx can be used as your node memory

JVM.CONF

Our hardware compatibility can take upto:

  • Xms2g
  • Xmx4g

RUNTIME.PROPERTIES

Most important properties for broker is buffer size and cache size

Buffer size is the amount of time it takes for your computer to process any incoming signal.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from frequently used main memory locations.

Typically for our hardware we can take

  • druid.processing.buffer.sizeBytes = 25600000
  • druid.cache.sizeInBytes=1000000

We can increase cache and buffer size to speed up your druid queries as fast as you can.

Tune Druid Historical and Middlemanager

Historical and middle manager service can be on single node or diff nodes. If it is single node specify the jvm.conf xms . Specify xmx more to Historical than middle manager, because historical node takes care of handover the data to druid after task is created and succeed.  

JVM.CONF

Historical

  • Xms2g
  • Xmx2g

Middle manager

  • Xms64m
  • Xmx1g

RUNTIME.PROPERTIES

The buffer size and cache size for both the services is same as broker Node.

For  Middle manager, we can increase task running speed by simply increment in

druid.indexer.running.javaOPts: Xmx1g -Xmx2g

If you are using different hardware, we recommend adjusting configurations for your specific hardware. The most commonly adjusted configurations are:

  • -Xmx and -Xms
  • druid.server.http.numThreads
  • druid.processing.buffer.sizeBytes
  • druid.processing.numThreads
  • druid.query.groupBy.maxIntermediateRows
  • druid.query.groupBy.maxResults
  • druid.server.maxSize and druid.segmentCache.locations on Historical Nodes
  • druid.worker.capacity on MiddleManagers

UnBlock Firewall

If you’re using a firewall or some other system that only allows traffic on specific ports, allow inbound connections on the following:

  • 1527 (Derby on your Coordinator; not needed if you are using a separate metadata store like MySQL or PostgreSQL)
  • 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)
  • 8081 (Coordinator)
  • 8082 (Broker)
  • 8083 (Historical)
  • 8084 (Standalone Realtime, if used)
  • 8088 (Router, if used)
  • 8090 (Overlord)
  • 8091, 8100–8199 (Druid Middle Manager; you may need higher than port 8199 if you have a very high druid.worker.capacity)
  • 8200 (Tranquility Server, if used)

Start Coordinator And Overlord

On your coordination server, cd into the distribution and start up the coordination services (you should do this in different windows or pipe the log to a file):

  • java `cat conf/druid/coordinator/jvm.config | xargs` -cp conf/druid/_common:conf/druid/coordinator:lib/* org.apache.druid.cli.Main server coordinator
  • java `cat conf/druid/overlord/jvm.config | xargs` -cp conf/druid/_common:conf/druid/overlord:lib/* org.apache.druid.cli.Main server overlord

You should see a log message printed out for each service that starts up. You can view detailed logs for any service by looking in the var/log/druid directory using another terminal.

Start Historicals and Middle Managers

Copy the Druid distribution and your edited configurations to your servers set aside for the Druid Historicals and MiddleManagers.

On each one, cd into the distribution and run this command to start a Data server:

  • java `cat conf/druid/historical/jvm.config | xargs` -cp conf/druid/_common:conf/druid/historical:lib/* org.apache.druid.cli.Main server historical
  • java `cat conf/druid/middleManager/jvm.config | xargs` -cp conf/druid/_common:conf/druid/middleManager:lib/* org.apache.druid.cli.Main server middleManager

You can add more servers with Druid Historicals and MiddleManagers as needed.

Start Druid Broker

  • java `cat conf/druid/broker/jvm.config | xargs` -cp conf/druid/_common:conf/druid/broker:lib/* org.apache.druid.cli.Main server broker

You can add more Brokers as required based on query load.

-Blog by Sai Chandra