ELK: Scaling an ElasticSearch Cluster

elasticsearch-logoThe heart of the ELK stack is Elasticsearch.  In order to provide high availability and scalability, it needs to be deployed as a cluster with master and data nodes.  The Elasticsearch cluster is responsible for both indexing incoming data as well as searches against that indexed data.


As described in the documentation, if there is one absolutely critical resource it is memory.  Keeping the heap size less than 32G will allow you to use compressed object pointers which is preferred.  Swapping memory takes a big hit, so minimize swappiness on your Linux host.

Continue reading “ELK: Scaling an ElasticSearch Cluster”

ELK: Federated Search with a Tribe node

elasticsearch-logoAlthough the ELK stack has rich support for clustering, clustering is not supported over WAN connections due to Elasticsearch being sensitive to latency.  There are also practical concerns of network throughput given how much data some installations index on an hourly basis.

So as nice as it would be to have a unified, eventually consistent cluster span across your North America and European datacenters, that is not currently a possibility.  Across availability zones in the same AWS datacenter will work, but not across different regions.

Federated Search

But first let’s consider why we want a distributed Elasticsearch cluster in the first place.  It is not typically for geo failover or disaster recovery (because we can implement that separately in each datacenter), but more often because we want end users to have a federated search experience.

We want end users to go to a single Kibana instance, regardless of which cluster they want to search, and be able to execute a search query against the data.  A Tribe node can bridge two distinct clusters for this purpose.

Continue reading “ELK: Federated Search with a Tribe node”

ELK: Pointing Kibana to a Client Node

elasticsearch-logoKibana is the end user web application that allows us to query Elasticsearch data and create dashboards that can be used for analysis and decision making.

Although Kibana can be pointed to any of the nodes in your Elasticsearch cluster, the best way to distribute requests across the nodes is to use a non-master, non-data Client node.  Client nodes have the following properties set in elasticsearch.yml:

cluster.name: mycluster
node.master: false
node.data: false

Continue reading “ELK: Pointing Kibana to a Client Node”

Node.js: Packaging modules for offline deployment using npm-bundle

nodejs-logoIn a production environment, it is common to have restricted internet access on the production deployment hosts.  This means that using the standard ‘npm install’ and pulling modules from the registry.npmjs.org repository is not an option.

Given the breadth of the dependency graph required for most modules, this packaging is something you want automated without needing to modify the package.json file by hand.

After various failed attempts at: using npmbox, scripts wrapping up ‘npm pack’, and archiving the entire node_modules directory – the npm-bundle module finally provided a proper solution.

Continue reading “Node.js: Packaging modules for offline deployment using npm-bundle”