ELK: Custom template mappings to force field types

It is very common to have Logstash create time-based indexes in ElasticSearch that fit the format, <indexName>-YYYY.MM.DD.  This means events submitted with @timestamp for that day all go to the same index.

However, if you do not explicitly specify an index template that maps each field to a type, you can end up with unexpected query results.  The reason is that without explicit mappings, the index (that is created fresh each day) uses its best judgement to assign field types based on the first event inserted.

In this article, I’ll show you how to create explicit custom index templates so that field types are uniform across your time-series indexes.

Example

Let’s prove this out with a real example.  As a prerequisite, make sure curl is installed so we can make direct REST calls against ElasticSearch and jq for json pretty printing.

$ sudo apt-get install curl jq -y

We’ll create a new index for our ‘test’ type, and we will insert the following fields:

  • @timestamp – standard timestamp for event, Date
  • myid – integer specifying unique id of some sort, Number
  • myname – name of person, String
  • mydate – full timestamp, Date
  • isStudent – whether person is a student, Boolean

First, we make the call directly to ElasticSearch to automatically create this new index with a best guess as to the field types:

$ curl -XPOST http://127.0.0.1:9200/test-2017.05.01/test -d '{"@timestamp":"2017-05-01T17:36:26.605Z", "myid":1, "myname": "adam", "isStudent": true, "mydate":"2017-05-01T12:36:26.605Z" }' --silent | jq .
{
  "created": true,
  "_shards": {
    "failed": 0,
    "successful": 1,
    "total": 1
  },
  "_version": 1,
  "_id": "AVvGUvnPlDjGCQXkG2_D",
  "_type": "test",
  "_index": "test-2017.05.01"
}

Now query the index mappings, and we can see the best guesses made by ElasticSearch all look good: myid is of type long, isStudent is boolean, and mydate is a date.

$ curl -XGET 'http://127.0.0.1:9200/test-2017.05.01/_mappings?pretty=1'
{
  "test-2017.05.01": {
    "mappings": {
      "test": {
        "properties": {
          "myname": {
            "type": "string"
          },
          "myid": {
            "type": "long"
          },
          "mydate": {
            "format": "strict_date_optional_time||epoch_millis",
            "type": "date"
          },
          "isStudent": {
            "type": "boolean"
          },
          "@timestamp": {
            "format": "strict_date_optional_time||epoch_millis",
            "type": "date"
          }
        }
      }
    }
  }
}

However, now let’s pretend it is the next day and time to create another index, and this time, the first event created for the day has some empty fields and the data is not as clean as we would expect.  “myid” has a not applicable value of “n/a” instead of an integer, “isStudent” starts with an uppercase “False”, and “mydate” is not populated.

$ curl -XPOST http://127.0.0.1:9200/test-2017.05.02/test -d '{"@timestamp":"2017-05-02T17:36:26.605Z", "myid":"n/a", "myname": "robert", "isStudent": "False", "mydate":"" }' --silent | jq .
{
  "created": true,
  "_shards": {
    "failed": 0,
    "successful": 1,
    "total": 1
  },
  "_version": 1,
  "_id": "AVvGWLhLlDjGCQXkG2_E",
  "_type": "test",
  "_index": "test-2017.05.02"
}

Now when we look at the mappings for this index they are far from accurate, with all the custom fields being created as string.  No matter what other accurate events are submitted throughout the day, this index will always treat these fields as string types.

$ curl -XGET 'http://127.0.0.1:9200/test-2017.05.02/_mappings?pretty=1'
{
  "test-2017.05.02": {
    "mappings": {
      "test": {
        "properties": {
          "myname": {
            "type": "string"
          },
          "myid": {
            "type": "string"
          },
          "mydate": {
            "type": "string"
          },
          "isStudent": {
            "type": "string"
          },
          "@timestamp": {
            "format": "strict_date_optional_time||epoch_millis",
            "type": "date"
          }
        }
      }
    }
  }
}

This is not going to work well when querying across multiple days of data, as you expect to sort or slice by dates, integers, and booleans and instead this index only knows string types.

This same logic applies for other standard field types such as: boolean, numbers, IP addresses, geopoints, etc.

Custom Index Template

The solution to this problem is custom index templates.  By creating an explicit mapping, you can guarantee that all indices created in that time series format will have the same field types.  Execute the command below to create a template mapping (this POST has multiple lines):

$ curl -XPUT http://127.0.0.1:9200/_template/test_template -d '{
    "template" : "test*",
    "mappings" : {
      "test" : {
        "properties": {
          "@timestamp":{"type":"date","format":"dateOptionalTime"},
          "myid":{"type":"integer"},
          "myname":{"type":"string", "index":"not_analyzed"},
          "isStudent":{"type":"boolean"},
          "mydate":{"type":"date"}
        }
      }
    }
}'
{"acknowledged":true}

Note that there are new breaking mapping changes in 5.x,  I am using the older 2.x mapping types in this example.

Before going on, refresh the test index to ensure it is being used.

$ curl -XPOST 'http://127.0.0.1:9200/test*/_refresh?pretty=1'
{
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  }
}

Then retrieve the template to double-check it is defined as expected:

$ curl -XGET http://127.0.0.1:9200/_template/test_template?pretty=1
{
  "test_template" : {
    "order" : 0,
    "template" : "test*",
    "settings" : { },
    "mappings" : {
      "test" : {
        "properties" : {
          "mydate" : {
            "type" : "date"
          },
          "@timestamp" : {
            "format" : "dateOptionalTime",
            "type" : "date"
          },
          "myid" : {
            "type" : "integer"
          },
          "myname" : {
            "index" : "not_analyzed",
            "type" : "string"
          },
          "isStudent" : {
            "type" : "boolean"
          }
        }
      }
    },
    "aliases" : { }
  }
}

Now we will create an index for yet another day, again with incomplete data.

$ curl -XPOST http://127.0.0.1:9200/test-2017.05.03/test -d '{"@timestamp":"2017-05-03T16:36:26.605Z", "myid":null, "myname": "phillip", "isStudent": null, "mydate":null }' --silent | jq .
{
  "created": true,
  "_shards": {
    "failed": 0,
    "successful": 1,
    "total": 1
  },
  "_version": 1,
  "_id": "AVvGrsPalDjGCQXkG2_u",
  "_type": "test",
  "_index": "test-2017.05.03"
}

But notice how this time instead of creating string fields for everything it cannot guess correctly, instead the field types are created according to the template – which is what we want.

$ curl -XGET 'http://127.0.0.1:9200/test-2017.05.03/_mappings?pretty=1'
{
  "test-2017.05.03" : {
    "mappings" : {
      "test" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "isStudent" : {
            "type" : "boolean"
          },
          "mydate" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "myid" : {
            "type" : "integer"
          },
          "myname" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }
}

And if you were to try to insert incorrect types, such as an empty string “” for the date, or a “n/a” for the integer field myid, you would get a 400 HTTP code back and an explanation as to which field was invalid.

Note that if you were trying an insert from Logstash without using the proper types, you would see “MapperParsingException” in the /var/log/logstash/logstash.log, and that log would be growing very quickly if a large number of inserts were being attempted.

Kibana

Once you make these custom index template changes at the ElasticSearch level, and your new indexes are being created with the proper types, you will want to go into Kibana Settings > Index Patterns, and do a refresh so that the field types are recognized.

 

 

 

REFERENCES

 

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

 

 

 

 

 

 

 

 

curl -XGET http://127.0.0.1:9200/_cluster/health?pretty=1

curl -XDELETE http://127.0.0.1:9200/_template/test_template (delete mapping)

curl -XGET ‘http://localhost:9200/_mapping?pretty=true’ (show all mappings)

curl -XGET ‘http://localhost:9200/_template?pretty=true’ (show all templates)