ELK: ElasticDump and Python to create a data warehouse job

By nature, the amount of data collected in your ElasticSearch instance will continue to grow and at some point you will need to prune or warehouse indexes so that your active collections are prioritized.

ElasticDump can assist in moving your indexes either to a distinct ElasticSearch instance that is setup specifically for long term data, or exporting the data as json for later import into a warehouse like Hadoop.  ElasticDump does not have a special filter for time based indexes (index-YYYY.MM.DD), so you must specify exact index names.

In this article we will use Python to query a source ElasticSearch instance (an instance meant for near real-time querying, keeps minimal amount of data), and exports any indexes from the last 14 days into a target ElasticSearch instance (an instance meant for data warehousing, has more persistent storage and users expect multi-second query times).

Continue reading “ELK: ElasticDump and Python to create a data warehouse job”

ELK: Architectural points of extension and scalability for the ELK stack

elasticsearch-logoThe ELK stack (ElasticSearch-Logstash-Kibana), is a horizontally scalable solution with multiple tiers and points of extension and scalability.

Because so many companies have adopted the platform and tuned it for their specific use cases, it would be impossible to enumerate all the novel ways in which scalability and availability had been enhanced by load balancers, message queues, indexes on distinct physical drives, etc… So in this article I want to explore the obvious extension points, and encourage the reader to treat this as a starting point in their own design and deployment.

Continue reading “ELK: Architectural points of extension and scalability for the ELK stack”

ELK: Scaling an ElasticSearch Cluster

elasticsearch-logoThe heart of the ELK stack is Elasticsearch.  In order to provide high availability and scalability, it needs to be deployed as a cluster with master and data nodes.  The Elasticsearch cluster is responsible for both indexing incoming data as well as searches against that indexed data.


As described in the documentation, if there is one absolutely critical resource it is memory.  Keeping the heap size less than 32G will allow you to use compressed object pointers which is preferred.  Swapping memory takes a big hit, so minimize swappiness on your Linux host.

Continue reading “ELK: Scaling an ElasticSearch Cluster”

SaltStack: Keeping Salt Pillar data encrypted using GPG

saltstack_logo-thumbnailWhen automating software and infrastructure, it is not uncommon to need to supply a user id and password for installation or other operations.  While it is certainly possible to pass these plaintext credentials directly in the state, this is not best practice.

# not best practice!!!

    - name: frank
    - password: "test3rdb"
    - host: localhost

There are several issues with this approach.

Continue reading “SaltStack: Keeping Salt Pillar data encrypted using GPG”