ELK: Using Ruby in Logstash filters

elastic-logstash-fwLogstash has a rich set of filters, and you can even write your own, but often this is not necessary since there is a out-of-the-box filter that allows you to embed Ruby code directly in the configuration file.

Using logstash-filter-ruby, you can use all the power of Ruby string manipulation to parse an exotic regular expression, an incomplete date format, write to a file, or even make a web service call.

Logstash Installation

If you haven’t installed Logstash already, refer to the official instructions here.

Sample event

Let’s use an example throughout this article of a log event with 3 fields:

  • timestamp with no date – 02:36.01
  • full path to source log file – /var/log/Service1/myapp.log
  • string – ‘Ruby is great’

The event looks like below, and we will use this in the upcoming examples.

02:36.01 /var/log/Service1/myapp.log Ruby is great

Assuming you have installed Logstash at “/opt/logstash”, create  “/opt/logstash/ruby-logstash.conf”:

input {
  stdin { }
}

filter {
} # filter

output {
  stdout { codec => rubydebug }
}

Now run logstash, and after a couple of seconds it should say “Pipeline main started” and will be waiting for input from standard input.  Paste in the full event line provided above.

$ bin/logstash -f ruby-logstash.conf

And the output should look like:

{
       "message" => "02:36.01 /var/log/Service1/myapp.log Ruby is great",
      "@version" => "1",
    "@timestamp" => "2017-04-24T12:39:07.659Z",
          "host" => "lstash1"
}

Since we have not provided any kind of filters, it simply echoes back our log events as the field “message”, and adds some metadata about the host and time in which the event was processed.

Now let’s do the minimal and break apart this single message into fields:

input {
  stdin { }
}

filter {

  grok {
    match => { "message" => "%{DATA:justtime} %{DATA:logsource} %{GREEDYDATA:msg}" }
  } #

} # filter

output {
  stdout { codec => rubydebug }
}

Run logstash again, give it our event line, and it should now report back with the extracted fields.

{
            "message" => "02:36.01 /var/log/Service1/myapp.log Ruby is great",
           "@version" => "1",
         "@timestamp" => "2017-04-24T14:01:38.769Z",
               "host" => "lstash1",
           "justtime" => "02:36.01",
          "logsource" => "/var/log/Service1/myapp.log",
    "msg" => "Ruby is great"
}

While these fields could be used as-is, the following sections show how we can improve upon each.

If you want the final version of ‘ruby-logstash.conf’, you can download it from my github project.

Ruby to parse an incomplete date

Notice that the time ’02:36.01′ is very sparse.

It does not provide a date, nor does it provide a timezone reference.  Presumably, we have to assume that the date is the current day, and only after examining the server settings and how it constructs log files do we know the real timezone.

Let’s add the following processing to our simple.conf file inside the filter.  First, we allow the standard logstash ‘date’ to handle the parsing (which will yield incorrect results), and then below that we use ‘ruby’ to augment with today’s date and then parse the results (which yields correct results).

  # incorrectly autopopulates to first day of year
  date {
    match => [ "justtime", "HH:mm.ss" ]
    target => "incorrectfulldatetime"
    timezone => "America/Los_Angeles"
  } # date

  # use ruby to augment with current day
  ruby {
    code => "
    event['fulldatetime'] = Time.now.strftime('%Y-%m-%d') + ' ' + event['justtime']
    "
  }
  date {
    match => [ "fulldatetime", "YYYY-MM-dd HH:mm.ss" ]
    target => "correctfulldatetime"
    timezone => "America/Los_Angeles"
  } # date

The results should look something like below:

{
                  "message" => "02:36.01 /var/log/services/service1/myapp.log Ruby is great",
                 "@version" => "1",
               "@timestamp" => "2017-04-24T14:15:22.870Z",
                     "host" => "lstash1",
                 "justtime" => "02:36.01",
                "logsource" => "/var/log/Service1/myapp.log",
           "msg" => "Ruby is great",
    "incorrectfulldatetime" => "2017-01-01T10:36:01.000Z",
             "fulldatetime" => "2017-04-24 02:36.01",
      "correctfulldatetime" => "2017-04-24T09:36:01.000Z"
}

Notice how “incorrectfulldatetime” is 2017-01-01, set for the first day of the year because no date part was provided.  And notice how “correctfulldatetime” uses 2017-04-24 (the current day).

We targeted a new field named “correctfulldatetime” above, but if you wanted this to update the standard timestamp field, you would simply target “@timestamp”.

Ruby to parse a complex expression

The source log file name, ‘/var/log/Service1/myapp.log’ has embedded data we want to extract.  The parent directory ‘Service1’ directory indicates the name of the service, and it could be nested arbitrarily deep and also needs to be lowercase.

And while this is certainly a case where we could use the standard logstash regular expression to extract the value, let’s use Ruby because you may have a situation where the standard regex don’t cover you or the expression is so complex that for maintenance you want to break down the problem into easily readable steps.

So we add the following to the filter of simple.conf:

  # split apart log source to extract service name
  ruby {
    code => "
      fpath = event['logsource'].split('/')
      event['serviceName'] = fpath[fpath.length-2].downcase
    "
  }

And now we get the below results, which show the extracted ‘serviceName’ set to ‘service1’.

{
                  "message" => "02:36.01 /var/log/Service1/myapp.log Ruby is great",
                 "@version" => "1",
               "@timestamp" => "2017-04-24T09:36:01.000Z",
                     "host" => "lstash1",
                 "justtime" => "02:36.01",
                "logsource" => "/var/log/Service1/myapp.log",
           "msg" => "Ruby is great",
    "incorrectfulldatetime" => "2017-01-01T10:36:01.000Z",
             "fulldatetime" => "2017-04-24 02:36.01",
              "serviceName" => "service1"
}

 Ruby to write to a local file

Whether for debugging, metrics, or instrumentation, you may find it useful to have Logstash running as a service write out field values to the local disk.  Use file locks if you absolutely need atomicity.

# append msg field to disk
ruby {
  code => "
    File.open('/tmp/mydebug.log','a') { |f| f.puts event['msg'] }
  "
}

In Logstash 5.x, the direct method of accessing fields is deprecated and instead you should use the event API.  This would look like:

# append msg field to disk
ruby {
  code => "
    File.open('/tmp/mydebug.log','a') { |f| f.puts event.get('msg') }
  "
}

Ruby to make a web service call

If you are working on an advanced integration, you may want to take the message “Ruby is great” and do some kind of REST/Web/network lookup based on that information.

For our simple purposes, let’s take the first word of the message “Ruby” and call a REST based echo service that returns results in json format.  Here is the code you would add to the filter:

   # call out to REST based echo service
   ruby {
     init => "
       require 'net/http'
       require 'json'
     "
     code => "
       firstWord = event['msg'].split(' ')[0]
       uri = URI.parse('http://echo.jsontest.com/word/' + firstWord)
       response = Net::HTTP.get_response(uri)
       if response.code == '200'
         result = JSON.parse(response.body)
         returnWord = result['word']
         event['echo'] = firstWord + ' echoed back as: ' + returnWord
       else
         event['echo'] = 'ERROR reaching web service'
       end
     "
   }

And below you can see the result come back in the ‘echo’ field:

{
                  "message" => "02:36.01 /var/log/Service1/myapp.log Ruby is great",
                 "@version" => "1",
               "@timestamp" => "2017-04-24T09:36:01.000Z",
                     "host" => "lstash1",
                 "justtime" => "02:36.01",
                "logsource" => "/var/log/Service1/myapp.log",
                      "msg" => "Ruby is great",
    "incorrectfulldatetime" => "2017-01-01T10:36:01.000Z",
             "fulldatetime" => "2017-04-24 02:36.01",
              "serviceName" => "service1",
                     "echo" => "Ruby echoed back as: Ruby"
}

 

If you want the final version of ‘ruby-logstash.conf’, you can download it from my github project.

 

REFERENCES

https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html

https://github.com/logstash-plugins/logstash-filter-ruby

https://github.com/logstash-plugins

http://blog.comperiosearch.com/blog/2015/04/10/how-to-develop-logstash-configuration-files/

http://www.jsontest.com/#echo

https://gist.github.com/timsavery/1657351