ELK: ElasticDump and Python to create a data warehouse job

By nature, the amount of data collected in your ElasticSearch instance will continue to grow and at some point you will need to prune or warehouse indexes so that your active collections are prioritized.

ElasticDump can assist in moving your indexes either to a distinct ElasticSearch instance that is setup specifically for long term data, or exporting the data as json for later import into a warehouse like Hadoop.  ElasticDump does not have a special filter for time based indexes (index-YYYY.MM.DD), so you must specify exact index names.

In this article we will use Python to query a source ElasticSearch instance (an instance meant for near real-time querying, keeps minimal amount of data), and exports any indexes from the last 14 days into a target ElasticSearch instance (an instance meant for data warehousing, has more persistent storage and users expect multi-second query times).

Continue reading “ELK: ElasticDump and Python to create a data warehouse job”