Logstash: Using metrics to debug the filtering process

elastic-logstash-fw When building your logstash filter, you would often like to validate your assumptions on a large sampling of input events without sending all the output to ElasticSearch.

Using Logstash metrics and conditionals, we can easily show:

  • How many input events were processed successfully
  • How many input events had errors
  • An error file containing each event that processed in error

This technique gives you the ability to track your success rate across a large input set, and then do a postmortem review of each event that failed.

I’ll walk you through a Logstash conf file that illustrates this concept.

input {
  stdin {
    type => "testevents"
  }
} # input end


filter {
  # expecting a name and age on each line separated by space, e.g.:
  grok {
    match => { "message" => "%{DATA:name} %{NUMBER:age}" }
  }

  # total event count
  metrics {
    meter => [ "total_%{type}" ]
    add_tag => "metric"
  } # metrics

  # failure count
  if "_grokparsefailure" in [tags] {
    metrics {
      meter => [ "problem_%{type}" ]
      add_tag => "metric"
    } # metrics
  } # if grok failure


} # filter end


output {

  # write any problems to file for later review
  if "_grokparsefailure" in [tags] {
       file { path => "/tmp/errors.log" }
  }

  if "metric" in [tags] {
    # write progress to file for tail in real-time
    file { path => "/tmp/metrics.log" }
  }else {
    # true events can be routed wherever necessary
    stdout { codec => rubydebug }
  }


} # output end

I created this file at /opt/logstash/testmetrics.conf, so that I can easily start logstash from the console:

cd /opt/logstash
bin/logstash -f testmetrics.conf

Once you see the message “Pipeline main started”, then go ahead and type each of the lines below:

moe 52

larry 51

curly

This will produce output like below.  You will notice that “moe 52” and “larry 51” both parse properly, but our last line of input where we only typed “curly” without an age produces a grokparsefailure.

Pipeline main started
moe 52
{
       "message" => "moe 52",
      "@version" => "1",
    "@timestamp" => "2016-07-18T02:13:46.986Z",
          "type" => "testevents",
          "host" => "myhost",
          "name" => "moe",
           "age" => "52"
}
larry 41
{
       "message" => "larry 41",
      "@version" => "1",
    "@timestamp" => "2016-07-18T02:13:50.433Z",
          "type" => "testevents",
          "host" => "myhost",
          "name" => "larry",
           "age" => "41"
}
curly
{
       "message" => "curly",
      "@version" => "1",
    "@timestamp" => "2016-07-18T02:13:53.161Z",
          "type" => "testevents",
          "host" => "myhost",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Now look at the output that went to /tmp/errors.log, and you will see the exact message sent for the input event that failed:

{"message":"curly","@version":"1","@timestamp":"2016-07-18T02:13:53.161Z","type":"testevents","host":"myhost","tags":["_grokparsefailure"]}

And now look at the output that went to /tmp/metrics.log, and you can see the total number of events process was “count:3”

{“@version”:”1″,”@timestamp”:”2016-07-18T02:13:58.914Z”,”message”:”myhost”,“total_testevents”:{“count”:3,”rate_1m”:0.38400888292586466,”rate_5m”:0.3966942907643235,”rate_15m”:0.3988919696009794},”tags”:[“metric”]}

While the total number of failures was “count:1”

{“@version”:”1″,”@timestamp”:”2016-07-18T02:13:58.915Z”,”message”:”myhost”,“problem_testevents”:{“count”:1,”rate_1m”:0.2,”rate_5m”:0.2,”rate_15m”:0.2},”tags”:[“metric”]}