CloudFoundry: Extracting Java multiline exception stack traces from Logback and Log4j2 using Logstash

Cloud Foundry deployed Java applications can send log events to stdout/stderr and then a bound syslog drain can send this to a logging solution like ELK for ingestion.

One area that has always been tricky when dealing with logging is multi-line Java stack traces.  By default, because each line in the stack trace has a line feed character, they are sent as individual log events.  In other words, Cloud Foundry will send a 100 line stack trace as 100 individual syslog messages.

To avoid this problem when using Spring/Spring Boot with a logging implementation of either Logback or Log4j2, I have written an article here that shows exactly how to format logback.xml or log4j2.xml to collapse the stack trace.  In summary, the stack trace newline characters are swapped for \u2028 Unicode characters which means an exception is sent as a single unit.

In this article, I will show how to parse the events coming from an Cloud Foundry deployed application’s syslog drain, reconstituting the stack traces to their original format by translating back to a new line character.

Prerequisites

It is assumed that you have built the customized spring-music project that collapses multi-line stack traces into a single line.  Here is a link to my article showing the exact steps.

Then you need to do a “cf push” of this application to a Cloud Foundry instance.  I am going to recommend that you use a local Cloud Foundry instance because you have to direct its syslog drain to a Logstash instance.  You can use either install PCF Dev as described here, or install CF on top of BOSH Lite as described here.

With a local CF instance deployed, you can follow the instructions here for deploying spring-music to a CF instance, with the understanding that your local API endpoint will not be api.run.pivotal.io, instead it will be your local CF API endpoint like api.local.pcfdev.io or api.bosh-lite.com.

Install and run Logstash

Since you have VirtualBox running locally for the local CF instance, go ahead and create an Ubuntu 16.04 VM and install Logstash 6.x as described here.   You could also install Logstash on Ubuntu 14.04 if you wanted.

Grab the “cf-parse-5x.conf” file from my github project, which is a custom configuration file for Logstash 5.3+ and copy it to “/usr/share/logstash”.

$ cd /usr/share/logstash
$ sudo wget https://raw.githubusercontent.com/fabianlee/spring-music/master/src/main/resources/cf-logstash/cf-parse-5x.conf

Then run logstash using this configuration file:

$ sudo bin/logstash -f cf-parse-5x.conf

The configuration file has a syslog listener as input, which is now listening on port 514 for any events coming from the CF syslog drain of the application.   And any Unicode \2028 characters (used to collapse stack traces), are now translated back to new lines.

ruby {
   code => "
     event.get('message').gsub!(/u2028/,%Q[\n])
   "
}

If you are using an older 2.x version of Logstash, then you should use this conf file instead.

Setup CF application drain

Just as described in my article on CF logging, you will want to Create a User Provided Service (cups) that is a syslog drain to the address of the Logstash server just created.  Then you bind the deployed spring-music application to that service.

$ cf cups logstash -l syslog://<logstashServer>:514

$ cf bind-service spring-music logstash

$ cf restage spring-music

Almost immediately you should start seeing messages at the Logstash console.  If you don’t, check the Logstash firewall and make sure port 514 is open (sudo ufw allow 514) and restart Logstash.

Validate regular application logs

Using your browser, go to the route of the locally deployed spring-music app and you will get parsed messages like below where each field in the message has been extracted.

{
 "severity_label" => "Emergency",
 "syslog5424_pri" => "14",
 "log_level" => "DEBUG",
 "cf_instance" => "0",
 "syslog5424_ts" => "2018-03-10T01:19:54.059935+00:00",
 "kv_instance_id" => "0",
 "message" => "Getting all albums at DEBUG level",
 "tags" => [
 [0] "syslog"
 ],
 "facility_label" => "kernel",
 "syslog5424_ver" => "1",
 "linenumber" => "26",
 "cf_log_type" => "APP",
 "kv_app_version" => "1.0",
 "@timestamp" => 2018-03-10T01:19:54.815Z,
 "classname" => "org.cloud.sampl.music.web.AlbumController",
 "syslog5424_app" => "0066630b-b9e5-469b-9556-6b0215bb71af",
 "syslog5424_host" => "cloudfoundry.development.spring-music",
 "use_index" => "APP_ENHANCED",
 "severity" => 0,
 "facility" => 0,
 "syslog_length" => "340",
 "@version" => "1",
 "multiexception" => "\n",
 "syslog5424_proc" => "[APP/PROC/WEB/0]",
 "kv_app_name" => "spring-music",
 "priority" => 0,
 "app_ts" => "2018-03-10T01:19:54.058+0000",
 "threadname" => "http-nio-8080-exec-6",
 "host" => "192.168.2.108"
}

Validate multi-line stack exception

Now visit the “/errors/throw” URL of the locally deployed application and you will see that the ‘multiexception’ field contains the full stack trace for the NullPointerException (I abbreviated it for sake of room).

Also notice that it contains “\n” newline characters, which means it has taken on its original format.

{
 "severity_label" => "Emergency",
 "syslog5424_pri" => "14",
 "log_level" => "ERROR",
 "cf_instance" => "0",
 "syslog5424_ts" => "2018-03-10T01:21:30.667066+00:00",
 "kv_instance_id" => "0",
 "message" => "Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.NullPointerException: Forcing an exception to be thrown] with root cause",
 "tags" => [
 [0] "syslog"
 ],
 "facility_label" => "kernel",
 "syslog5424_ver" => "1",
 "linenumber" => "181",
 "cf_log_type" => "APP",
 "kv_app_version" => "1.0",
 "@timestamp" => 2018-03-10T01:21:30.887Z,
 "classname" => "org.apach.catal.core.Conta.[Tomc.[loca.[/].[dispatcherServlet]",
 "syslog5424_app" => "0066630b-b9e5-469b-9556-6b0215bb71af",
 "syslog5424_host" => "cloudfoundry.development.spring-music",
 "use_index" => "APP_ENHANCED",
 "severity" => 0,
 "facility" => 0,
 "syslog_length" => "10557",
 "@version" => "1",
 "multiexception" => "java.lang.NullPointerException: Forcing an exception to be thrown\n\tat org.cloudfoundry.samples.music.web.ErrorController.throwException(ErrorController.java:22) ~[classes/:?]\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_162]\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_162]\n\tat 
...
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-8.5.15.jar!/:8.5.15]\n\tat java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]\n\n",
 "syslog5424_proc" => "[APP/PROC/WEB/0]",
 "kv_app_name" => "spring-music",
 "priority" => 0,
 "app_ts" => "2018-03-10T01:21:30.650+0000",
 "threadname" => "http-nio-8080-exec-8",
 "host" => "192.168.2.108"
}

 

 

REFERENCES

https://docs.cloudfoundry.org/loggregator/architecture.html#metron (CF logging arch)

http://docs.pivotal.io/pivotalcf/2-0/devguide/deploy-apps/streaming-logs.html#format (streaming log types)

https://docs.pivotal.io/tiledev/2-0/nozzle.html#examples (log format for CF components =<${PRI}>${VERSION} ${TIMESTAMP} ${HOST_IP} ${APP_NAME} ${PROD_ID} ${MSG_ID} ${SD-ELEMENT-instance} ${MESSAGE})

http://grokconstructor.appspot.com/do/construction (online grok)

https://github.com/cloudfoundry/dropsonde-protocol (dropsonde protocol buffers)

https://www.elastic.co/guide/en/logstash/current/plugins-codecs-protobuf.html (logstash protobuf input)

https://github.com/cloudfoundry-community/firehose-to-syslog (nozzle that resolves app name, space UUID/name, org UUID/name)

http://schd.ws/hosted_files/cfsummit2016/f7/MonitoringCloudFoundry-LearningAboutTheFirehose.pdf (shows UDP and TCP portions of CF logging system)

https://github.com/cloudfoundry-community/logsearch-for-cloudfoundry (ELK stack for CF)

https://discuss.pivotal.io/hc/en-us/articles/223207207-Why-Loggregator-may-lose-logs (predicted loss for msgs/sec)

http://scottfrederick.cfapps.io/blog/2014/02/20/cloud-foundry-and-logstash (how to break apart syslog 5424 event)

https://logz.io/blog/cloud-foundry-elk-stack/ (breaking apart 5424 syslog, binding service, types of kibana graphs)

https://docs.cloudfoundry.org/loggregator/cli-plugin.html (CF CLI plugin for viewing firehost)

https://discuss.elastic.co/t/match-and-replace-unicode-characters/56656 (idea for using ruby to do unicode replacement, mutate did not seem to work)

https://www.diycode.cc/projects/cloudfoundry/loggregator (bosh manifest and logging)

https://www.elastic.co/guide/en/logstash/5.0/breaking-changes.html (ruby event.get() breaking change)

 

 

https://github.com/swisscom/cf-buildpack-logstash (if you want to experiment with CF deployed logstash)

 

NOTES

Loggregator architecture