ElasticSearch very often serves as a repository for monitoring, logging, and business data. As such, integrations with external system are a requirement.
The Go programming language with its convenient deployment binary and rich set of packages can easily serve as a bridge between these systems and the ElasticSearch server.
We will use the olivere/elastic package for this purpose, it is well maintained and has support for both ElasticSearch 5.x and 2.x depending on your import statement. In this article, we will be hitting an ElasticSearch 2.x backend.
Continue reading “ELK: Connecting to ElasticSearch with a Go client”
It is very common to have Logstash create time-based indexes in ElasticSearch that fit the format, <indexName>-YYYY.MM.DD. This means events submitted with @timestamp for that day all go to the same index.
However, if you do not explicitly specify an index template that maps each field to a type, you can end up with unexpected query results. The reason is that without explicit mappings, the index (that is created fresh each day) uses its best judgement to assign field types based on the first event inserted.
In this article, I’ll show you how to create explicit custom index templates so that field types are uniform across your time-series indexes.
Continue reading “ELK: Custom template mappings to force field types”
ElastAlert from the Yelp Engineering group provides a very flexible platform for alerting on conditions coming from ElasticSearch.
In a previous article I fully describe running interactively on an Ubuntu server, and now I’ll expand on that by running it at system startup using a System-V init script.
One of the challenges of getting ElastAlert to run as a service is that is has a very strict set of module requirements that very easily conflicts with other Python applications, and so we will use Python’s virtualenv to build it in isolation and then call that wrapper from the service script.
Continue reading “ELK: Running ElastAlert as a service on Ubuntu 14.04”
ElasticSearch’s Metricbeat is a lightweight shipper of both system and application metrics that runs as an agent on a client host. That means that along with standard cpu/mem/disk/network metrics, you can also monitor Apache, Docker, Nginx, Redis, etc. as well as create your own collector in the Go language.
In this article we will describe installing Metricbeat 5.x on Ubuntu when the back end ElasticSearch version is either 5.x or 2.x.
Continue reading “ELK: Installing MetricBeat for collecting system and application metrics”
ElasticSearch’s commercial X-Pack has alerting functionality based on ElasticSearch conditions, but there is also a strong open-source contender from Yelp’s Engineering group called ElastAlert.
ElastAlert offers developers the ultimate control, with the ability to easily create new rules, alerts, and filters using all the power and libraries of Python.
Continue reading “ELK: ElastAlert for alerting based on data from ElasticSearch”
By nature, the amount of data collected in your ElasticSearch instance will continue to grow and at some point you will need to prune or warehouse indexes so that your active collections are prioritized.
ElasticDump can assist in moving your indexes either to a distinct ElasticSearch instance that is setup specifically for long term data, or exporting the data as json for later import into a warehouse like Hadoop. ElasticDump does not have a special filter for time based indexes (index-YYYY.MM.DD), so you must specify exact index names.
In this article we will use Python to query a source ElasticSearch instance (an instance meant for near real-time querying, keeps minimal amount of data), and exports any indexes from the last 14 days into a target ElasticSearch instance (an instance meant for data warehousing, has more persistent storage and users expect multi-second query times).
Continue reading “ELK: ElasticDump and Python to create a data warehouse job”
The Curator product from ElasticSearch allows you to apply batch actions to your indexes (close, create, delete, etc.). One specific use case is applying a retention policy to your indexes, deleting any indexes that are older than a certain threshold.
Start by installing Curator using apt and pip:
$ sudo apt-get install python-pip -y
$ sudo pip install elasticsearch-curator
$ /usr/local/bin/curator --version
Continue reading “ELK: Using Curator to manage the size and persistence of your index storage”
The ElasticSearch stack (ELK) is popular open-source solution that serves as both repository and search interface for a wide range of applications including: log aggregation and analysis, analytics store, search engine, and document processing.
Its standard web front-end, Kibana, is a great product for data exploration and dashboards. However, if you have multiple data sources including ElasticSearch, want built-in LDAP authentication, or the ability to annotate graphs, you may want to consider Grafana to surface your dashboards and visualizations.
Continue reading “Grafana: Connecting to an ElasticSearch datasource”
The ELK stack (ElasticSearch-Logstash-Kibana), is a horizontally scalable solution with multiple tiers and points of extension and scalability.
Because so many companies have adopted the platform and tuned it for their specific use cases, it would be impossible to enumerate all the novel ways in which scalability and availability had been enhanced by load balancers, message queues, indexes on distinct physical drives, etc… So in this article I want to explore the obvious extension points, and encourage the reader to treat this as a starting point in their own design and deployment.
Continue reading “ELK: Architectural points of extension and scalability for the ELK stack”
The heart of the ELK stack is Elasticsearch. In order to provide high availability and scalability, it needs to be deployed as a cluster with master and data nodes. The Elasticsearch cluster is responsible for both indexing incoming data as well as searches against that indexed data.
As described in the documentation, if there is one absolutely critical resource it is memory. Keeping the heap size less than 32G will allow you to use compressed object pointers which is preferred. Swapping memory takes a big hit, so minimize swappiness on your Linux host.
Continue reading “ELK: Scaling an ElasticSearch Cluster”