Python: Using Python, JSON, and Jinja2 to construct a set of Logstash filters

python-logoPython is a language whose advantages are well documented, and the fact that it has become ubiquitous on most Linux distributions  makes it well suited for quick scripting duties.

In this article I’ll go through an example of using Python to read entries from a JSON file, and from each of those entries create a local file.  We’ll use the Jinja2 templating language to generate each file from a base template.

Our particular example will be the generation of Logstash filters for log processing, but the techniques for using JSON to drive Python processing or Jinja2 templating within Python are general purpose.

Logstash Filters Explained

For those not familiar with the purpose of Logstash, it receives log input from various sources, and processes each log entry.  Each log source (e.g. web application, database, web proxy, system event) typically has a different format and therefore Logstash must understand each format so that it can extract the fields of each line properly (e.g. timestamps, response codes, etc.).

This requires that multiple filters be defined, and in our example we want each filter to be in a distinct file.  To help simplify our example, we will assume that a ‘type’ has already been assigned to each source.

As an example, let’s say that we needed a filter for Apache logs in the Combined Log Format, an incoming line would look like:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

And the filter file we created would look like (Logstash has a pre-canned regex for Apache logs):

filter {
  if [type] == "apache-access" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
  }
}

Meanwhile, one of our custom Java web applications might be emitting log lines that look like:

2016-06-21 11:35:46.206 DEBUG 31 --- [http-nio-61010-exec-7] a.b.s.c.m.a.RequestHandlerMap : finding method for path /main/tr:radco#main/download/

Which would require the filter to look like:

filter {
  if [type] == "custom-webapp" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:ts} %{DATA:level} %{NUMBER} --- \\[%{DATA:threadname}\\] %{DATA:classname} : %{GREEDYDATA:msg}" }
    }
  }
}

As you can see, the filter files follow the same template format, but have different [type] values and regular expressions.

In a real environment, with many different log sources, it could be very beneficial to keep our list of types/regex in a central file, and have each filter file generated instead of typing each manually.  This is less error-prone, and also allows us to make changes in bulk to the filter definitions if necessary.

JSON Structure

The first step will be the design of our data structure.   In a file named “filters.json” we define the following data structure:

{
"filters" : [

{
  "template": "logstash-template.conf",
  "type": "apache-access",
  "regex": "%{COMBINEDAPACHELOG}"
},

{ 
  "template": "logstash-template.conf",
  "type": "customer-webapp",
  "regex": "%{TIMESTAMP_ISO8601:ts} %{DATA:level} \\[%{DATA:threadname}\\] \\[%{DATA}\\] \\[%{DATA:classname}:%{NUMBER}\\] %{GREEDYDATA:msg}"
}


]
}

Take note of using the escaped backslash in the regex for the customer-webapp.  While a regex requires a backslash before a character like a bracket, we have to add one more backslash to escape it in JSON.

Template File

Then we create the template file ‘logstash-template.conf’ that has Jinja2 placeholders and logic:

filter {

if [type]=="{{type}}" {

  grok {
    match => { "message" => "{{regex}}" }
  }

} # end if type

} # end filter

Keep in mind that Jinja is a full fledged template engine capable of much more that simple text replacement, including loop iteration, filters, macros, etc…  The power of Jinja2 is the reason why it is used to drive complex scenarios in other Python utilities such as SaltStack for infrastructure automation.

Python Processor

In order to use the Jinja2 template engine, first install it using pip:

> sudo apt-get install python-pip -y
> sudo pip install jinja2

Then create the ‘makeLogstashFilters.py’ file:

#!/usr/bin/python
#
# prereq:
# sudo apt-get install python-pip -y
# sudo pip install jinja2
#
import sys
import json
import os
import jinja2

def render(tpl_path, context):
    path, filename = os.path.split(tpl_path)
    return jinja2.Environment(
        loader=jinja2.FileSystemLoader(path or './')
    ).get_template(filename).render(context)


# load json from file
jsonConfigName = "filters.json"
print "jsonConfigName: " + jsonConfigName
with open(jsonConfigName) as json_file:
    json_data = json.load(json_file)
    #print(json_data)

# iterate through each json filter entry
for fileEntry in json_data['filters']:

  # put entire json entry into jinja context for merging
  context = fileEntry

  print("================================================")

  # get template name, output file name
  templateFileName = fileEntry['template']
  outputFileName = "filter-" + fileEntry['type'] + ".conf"
  print("outputFileName: " + outputFileName)

  # merge template with data
  result = render(templateFileName,context)

  # write output to file
  outFile = open(outputFileName,"w")
  outFile.write(result)
  outFile.close()

print("================================================")

Now make the python script executable, and run it:

> chmod ugo+r+x ./makeLogstashFilters.py
> ./makeLogstashFilter.py

The output should look like below, and if you open the output Logstash filter files, you will see they match the desired format we detailed in the first section.

jsonConfigName: filters.json
================================================
outputFileName: filter-apache-access.conf
================================================
outputFileName: filter-customer-webapp.conf
================================================

 

 

REFERENCES

https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

http://www.json.org/

http://jinja.pocoo.org/docs/dev/

https://www.digitalocean.com/community/tutorials/adding-logstash-filters-to-improve-centralized-logging