ELK: metadata fields in Logstash for grok and conditional processing

elastic-logstash-fw When building complex, real-world Logstash filters, there can be a fair bit of processing logic. There are typically multiple grok patterns as well as fields used as flags for conditional processing.

The problem is, these intermediate extracted fields and processing flags are often ephemeral and unnecessary in your ultimate persistent store (e.g. ElasticSearch), but they will be inserted as fields unless you explicitly remove them.

One strategy is to use a mutate at the very end and remove any extra fields. A cleaner strategy that we will describe here is to declare these variables as @metadata field so they are never even considered for persistence.

Use @metadata in grok

Below is a simple example of a grok filter that is used to parse a message that looks like “hello world”, and puts “world” into the ‘myname’ field.

grok {
    match => { "message" => "hello %{GREEDYDATA:myname}" }
    break_on_match => false
  }

Now, you could use the value in the “myname” field to do conditional processing, populate other fields/tags, etc.

But unless you explicitly removed it with a mutate, this field would be passed to the persistent store (e.g. ElasticSearch) and would be stored, analyzed, and indexed. This may be what you want, but in case you really only want this value as an ephemeral conditional flag, you could instead use:

grok {
    match => { "message" => "hello %{GREEDYDATA:[@metadata][myname]}" }
    break_on_match => false
  }

The field [@metadata][myname] will never be inserted into your persistent store, but you can use it the same way you reference any field.

Using @metadata in grok and conditionals

A contrived example is the best way to show metadata fields in use.

The below filter is looking for messages that either look like “hello <name>” or “bye <name>”, and it uses a metadata field as the conditional flag. If it already found a match, then it doesn’t bother with evaluating another grok match.

Finally, it creates a “description” field based on the final value of that metadata processing flag.

input {
  stdin { }
}

filter {

# initialize metadata field used as flag
mutate {
  add_field => { "[@metadata][foundtype]" => "" }
}

# try to match 'hello' looking messages
if "" == [@metadata][foundtype] {
  grok {
    match => { "message" => "hello %{GREEDYDATA:[@metadata][myname]}" }
    break_on_match => false
    add_field => { "[@metadata][foundtype]" => "hellotype" }
    add_tag => [ "didhello" ]
  }
}

# try to match 'bye' looking messages
if "" == [@metadata][foundtype] {
  grok {
    match => { "message" => "bye %{GREEDYDATA:[@metadata][myname]}" }
    break_on_match => false
    add_field => { "[@metadata][foundtype]" => "byetype" }
    add_tag => [ "didbye" ]
  }
}

#  add description based on flag
if !("" == [@metadata][foundtype]) {
  mutate {
    add_field => { "description" => "action performed by %{[@metadata][myname]}" }
  }
}else {
  mutate {
    add_field => { "description" => "this was not a hello or bye message type" }
  }
}


} # filter

output {
 stdout { codec => rubydebug }
}

You can copy-paste the lines above, or download it from github as logstash-metadata.conf. Then run logstash:

$ bin/logstash -f logstash-metadata.conf

When you type ‘hello world’, the output event looks something like:

{
     "@timestamp" => 2017-05-01T16:53:15.894Z,
       "@version" => "1",
           "host" => "trusty1",
    "description" => "action performed by world",
        "message" => "hello world"
           "tags" => [ [0] "didhello" ]
}

Which shows that neither the [@metadata][foundtype] nor the [@metadata][myname] field values are persisted as a field in the final output. However, you can see the [@metadata][myname] value show up as the last word in the description as we specified.

We use the added tag simply as a debug tool to prove what processing took place.

And if you type ‘bye jack’, you get:

{
     "@timestamp" => 2017-05-01T16:53:18.599Z,
       "@version" => "1",
           "host" => "trusty1",
    "description" => "action performed by jack",
        "message" => "bye jack",
           "tags" => [
        [0] "_grokparsefailure"
        [1] "didbye"
    ]
}

Which now shows a different processing path, but again the @metadata is not persisted. Note the “_grokparsefailure” tag, it can be ignored but is because we first evaluated the “hello” block and it was not a match.

Now if we type “junk” as the last input event, we get:

{
     "@timestamp" => 2017-05-01T17:36:26.605Z,
       "@version" => "1",
           "host" => "trusty1",
    "description" => "this was not a hello or bye message type",
        "message" => "junk",
           "tags" => [
        [0] "_grokparsefailure"
    ]
}

Here the @metadata flag told us that it was not of either type, and it constructed a different ‘description’ message altogether. Once again the _grokparsefailure tag can be ignored and is a side-effect of testing grok filters that did not match.

REFERENCES

https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html

https://www.elastic.co/blog/logstash-metadata

https://www.elastic.co/guide/en/beats/filebeat/1.1/metadata-missing.html

https://www.elastic.co/guide/en/logstash/current/event-api.html

https://www.elastic.co/guide/en/logstash/master/plugins-filters-grok.html#plugins-filters-grok-overwrite

https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns