ELK: Connecting to ElasticSearch with a Go client

ElasticSearch very often serves as a repository for monitoring, logging, and business data.  As such, integrations with external system are a requirement.

The Go programming language with its convenient deployment binary and rich set of packages can easily serve as a bridge between these systems and the ElasticSearch server.

We will use the olivere/elastic package for this purpose, it is well maintained and has support for both ElasticSearch 5.x and 2.x depending on your import statement.  In this article, we will be hitting an ElasticSearch 2.x backend.

Continue reading “ELK: Connecting to ElasticSearch with a Go client”

ELK: Custom template mappings to force field types

It is very common to have Logstash create time-based indexes in ElasticSearch that fit the format, <indexName>-YYYY.MM.DD.  This means events submitted with @timestamp for that day all go to the same index.

However, if you do not explicitly specify an index template that maps each field to a type, you can end up with unexpected query results.  The reason is that without explicit mappings, the index (that is created fresh each day) uses its best judgement to assign field types based on the first event inserted.

In this article, I’ll show you how to create explicit custom index templates so that field types are uniform across your time-series indexes.

Continue reading “ELK: Custom template mappings to force field types”

ELK: Installing Logstash on Ubuntu 14.04

elastic-logstash-fwLogstash provides a powerful mechanism for listening to various input sources, filtering and extracting the fields, and then sending events to a persistence store like ElasticSearch.

Installing Logstash on Ubuntu is well documented, so in this article I will focus on Ubuntu specific steps required for Logstash 2.x and 5.x.

Continue reading “ELK: Installing Logstash on Ubuntu 14.04”

ELK: Using Ruby in Logstash filters

elastic-logstash-fwLogstash has a rich set of filters, and you can even write your own, but often this is not necessary since there is a out-of-the-box filter that allows you to embed Ruby code directly in the configuration file.

Using logstash-filter-ruby, you can use all the power of Ruby string manipulation to parse an exotic regular expression, an incomplete date format, write to a file, or even make a web service call.

Continue reading “ELK: Using Ruby in Logstash filters”

ELK: Running ElastAlert as a service on Ubuntu 14.04

ElastAlert from the Yelp Engineering group provides a very flexible platform for alerting on conditions coming from ElasticSearch.

In a previous article I fully describe running interactively on an Ubuntu server, and now I’ll expand on that by running it at system startup using a System-V init script.

One of the challenges of getting ElastAlert to run as a service is that is has  a very strict set of module requirements that very easily conflicts with other Python applications, and so we will use Python’s virtualenv to build it in isolation and then call that wrapper from the service script.

Continue reading “ELK: Running ElastAlert as a service on Ubuntu 14.04”

ELK: Installing MetricBeat for collecting system and application metrics

ElasticSearch’s Metricbeat is a lightweight shipper of both system and application metrics that runs as an agent on a client host.  That means that along with standard cpu/mem/disk/network metrics, you can also monitor Apache, Docker, Nginx, Redis, etc. as well as create your own collector in the Go language.

In this article we will describe installing Metricbeat 5.x on Ubuntu when the back end ElasticSearch version is either 5.x or 2.x.

Continue reading “ELK: Installing MetricBeat for collecting system and application metrics”

ELK: ElastAlert for alerting based on data from ElasticSearch

ElasticSearch’s commercial X-Pack has alerting functionality based on ElasticSearch conditions, but there is also a strong open-source contender from Yelp’s Engineering group called ElastAlert.

ElastAlert offers developers the ultimate control, with the ability to easily create new rules, alerts, and filters using all the power and libraries of Python.

Continue reading “ELK: ElastAlert for alerting based on data from ElasticSearch”

ELK: ElasticDump and Python to create a data warehouse job

By nature, the amount of data collected in your ElasticSearch instance will continue to grow and at some point you will need to prune or warehouse indexes so that your active collections are prioritized.

ElasticDump can assist in moving your indexes either to a distinct ElasticSearch instance that is setup specifically for long term data, or exporting the data as json for later import into a warehouse like Hadoop.  ElasticDump does not have a special filter for time based indexes (index-YYYY.MM.DD), so you must specify exact index names.

In this article we will use Python to query a source ElasticSearch instance (an instance meant for near real-time querying, keeps minimal amount of data), and exports any indexes from the last 14 days into a target ElasticSearch instance (an instance meant for data warehousing, has more persistent storage and users expect multi-second query times).

Continue reading “ELK: ElasticDump and Python to create a data warehouse job”

ELK: Architectural points of extension and scalability for the ELK stack

elasticsearch-logoThe ELK stack (ElasticSearch-Logstash-Kibana), is a horizontally scalable solution with multiple tiers and points of extension and scalability.

Because so many companies have adopted the platform and tuned it for their specific use cases, it would be impossible to enumerate all the novel ways in which scalability and availability had been enhanced by load balancers, message queues, indexes on distinct physical drives, etc… So in this article I want to explore the obvious extension points, and encourage the reader to treat this as a starting point in their own design and deployment.

Continue reading “ELK: Architectural points of extension and scalability for the ELK stack”

ELK: Scaling an ElasticSearch Cluster

elasticsearch-logoThe heart of the ELK stack is Elasticsearch.  In order to provide high availability and scalability, it needs to be deployed as a cluster with master and data nodes.  The Elasticsearch cluster is responsible for both indexing incoming data as well as searches against that indexed data.


As described in the documentation, if there is one absolutely critical resource it is memory.  Keeping the heap size less than 32G will allow you to use compressed object pointers which is preferred.  Swapping memory takes a big hit, so minimize swappiness on your Linux host.

Continue reading “ELK: Scaling an ElasticSearch Cluster”

ELK: Feeding the logging pipeline

elasticsearch-logoThe most varied point in an ELK (Elasticsearch-Logstash-Kibana) stack is the mechanism by which custom events and logs will get sent to Logstash for processing.

Companies running Java applications with logging sent to log4j or SLF4J/Logback will have local log files that need to be tailed.  Applications running in containers may send everything to stdout/stderr, or have drivers for sending this on to syslog and other locations.  Network appliances tend to have SNMP or remote syslog outputs.

But regardless of the details, events must flow from their source to the Logstash indexing layer.  Doing this with maximized availability and scalability, and without putting excessive pressure on the Logstash indexing layer is the primary concern of this article.

Continue reading “ELK: Feeding the logging pipeline”

ELK: Federated Search with a Tribe node

elasticsearch-logoAlthough the ELK stack has rich support for clustering, clustering is not supported over WAN connections due to Elasticsearch being sensitive to latency.  There are also practical concerns of network throughput given how much data some installations index on an hourly basis.

So as nice as it would be to have a unified, eventually consistent cluster span across your North America and European datacenters, that is not currently a possibility.  Across availability zones in the same AWS datacenter will work, but not across different regions.

Federated Search

But first let’s consider why we want a distributed Elasticsearch cluster in the first place.  It is not typically for geo failover or disaster recovery (because we can implement that separately in each datacenter), but more often because we want end users to have a federated search experience.

We want end users to go to a single Kibana instance, regardless of which cluster they want to search, and be able to execute a search query against the data.  A Tribe node can bridge two distinct clusters for this purpose.

Continue reading “ELK: Federated Search with a Tribe node”

ELK: Pointing Kibana to a Client Node

elasticsearch-logoKibana is the end user web application that allows us to query Elasticsearch data and create dashboards that can be used for analysis and decision making.

Although Kibana can be pointed to any of the nodes in your Elasticsearch cluster, the best way to distribute requests across the nodes is to use a non-master, non-data Client node.  Client nodes have the following properties set in elasticsearch.yml:

cluster.name: mycluster
node.master: false
node.data: false

Continue reading “ELK: Pointing Kibana to a Client Node”

ELK: Performance of the Logstash Indexing layer

elasticsearch-logoThe Logstash Indexing layer receives data from any number of input sources, transforms the data, and then submits it to Elasticsearch for indexing.  Transforming and extracting data from every event can be both I/O as well as CPU intensive.

Horizontal or Vertical

Vertical scaling will only go so far in the Logstash indexing layer.  In order to keep up with the processing demand as well as provide availability, horizontal scalability must be employed.

And if you are going to have vertical scaling, you should be using either configuration management (SaltStack, Ansible, etc.) or containers to be able to create extra Logstash indexing instances without excessive manual steps.

Continue reading “ELK: Performance of the Logstash Indexing layer”

Syslog: Sending Java log4j2 to rsyslog on Ubuntu

log4j-logoLogging has always been a critical part of application development.  But the rise of OS virtualization, applications containers, and cloud-scale logging solutions has turned logging into something bigger that managing local debug files.

Modern applications and services are now expected to feed log aggregation and analysis stacks (ELK, Graylog, Loggly, Splunk, etc).  This can be done a multitude of ways, in this post I want to focus on modifying log4j2 so that it sends directly to an rsyslog server.

Even though we focus on sending to an Ubuntu ryslog server in this post, this could be any entity listening for syslog traffic, such as Logstash.

Continue reading “Syslog: Sending Java log4j2 to rsyslog on Ubuntu”