Last time I mentioned that I was working on a central syslog. Part of the task was also possibility to easily go through the logs, preferably with some filtering and what not. ELK-stack is usually the first thing mentioned as a potential solution. Essentially the goal is to land your logs in Elasticsearch. The problem with both of these solutions is on the processing part. With Logstash things can go very wrong very quickly and there's only handful of other things than _grokparsefailure that can seriously put me into rage mode.

Setup

I will not cover setting up Elasticsearch -- there are tons of tutorials out there in the wild already and besides there are so many setup options (and needs along with it) that it's essentially pointless. For my own testing purpose though I just spun up a Docker container[1] and pushed stuff towards it. Same goes for Kibana which I used as interface to Elasticsearch for searching through the logs.

I really wanted to avoid two things:

  1. Some weird, third-party log shipping things.[2]
  2. Logstash.[3]

Fortunately it turned out that rsyslog can speak to Elasticsearch directly via omelasticsearch module. What's even better, contrary to syslog-ng's implementation, rsyslog doesn't require any java libraries for this to work as it's defaulting to HTTP to push the data. That's a big win IMHO. This plus being pre-installed on pretty much every major distribution makes it a no-brainer.

Apart from rsyslog itself, there's additional module necessary for it to be able to specify Elasticsearch as an output destination. On Debian/Ubuntu it's part of rsyslog-elasticsearch package. Once it's installed the entire configuration can be stored in a single file. As rsyslog by default includes everything from /etc/rsyslog.d/*.conf I simply created a new file in there, called 00-elasticsearch.conf and with the following content:

module(load="omelasticsearch")

template(name="plain-syslog" type="list" option.json="on") {
    constant(value="{")
    constant(value="\"@timestamp\":\"")     property(name="timereported" dateFormat="rfc3339")
    constant(value="\",\"host\":\"")        property(name="hostname")
    constant(value="\",\"severity-num\":")  property(name="syslogseverity")
    constant(value=",\"facility-num\":")    property(name="syslogfacility")
    constant(value=",\"severity\":\"")      property(name="syslogseverity-text")
    constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
    constant(value="\",\"syslogtag\":\"")   property(name="syslogtag")
    constant(value="\",\"message\":\"")     property(name="msg")
    constant(value="\"}")
}

template(name="logstash-index" type="string" string="logstash-%$YEAR%.%$MONTH%.%$DAY%")

action(type="omelasticsearch"
  template="plain-syslog"
  searchIndex="logstash-index"
  dynSearchIndex="on"
  bulkmode="on"
  errorfile="/var/log/omelasticsearch.log")

So few words about this setup. On the 1st line, omelasticsearch module is loaded so rsyslog can talk to Elasticsearch. If the rsyslog-elasticsearch package is missing, this will fail already here once rsyslog is being restarted. Lines 3rd to 14th are specifying template for the messages to go through prior being pushed to Elasticsearch. What is done here could be described as "jsonifying" everything that is being produced by rsyslog so it can be easily pushed towards Elasticsearch which will know exactly how to store it. I took the setup from the official documentation almost entirely as is, the only difference being that I wanted to keep numerical values for severity and facility.[4] Line 16th specifies one more template, this time for index name. This part defines how to name the index in Elasticsearch which can then be configured as index pattern in Kibana or Grafana. I kept the name logstash simply cause I'm used to it. I also kept the default, daily pattern of aggregation. Last but not least, lines 18th to 23rd are defining the actual storing of the logs in the Elasticsearch: defining which template should be applied for the stream of logs going from syslog (plain-syslog), which template should be used for the search index name (logstash-index), that dynSearchIndex should be used so that index name can use dynamic parts (like e.g. %$YEAR%.%$MONTH%.%$DAY%), that logs should be send to Elasticsearch in bulks (I kept defaults here) and whenever there are any errors with sending those bulks throw them into /var/log/omelasticsearch.log file.

With this setup in place, all the logs this host generates are going to land in Elasticsearch index. Each day will have its own index name like so: logstash-$YEAR.$MONTH.$DAY, for example: logstash-2019.03.25. This can be then easily taken by Kibana with almost zero modification and present the data in a nice and easy to search and filter through manner.

What's next?

Having nginx joining the rsyslog stream and now having rsyslog to land in the Elasticsearch, the only natural thing is to combine these two and profit. Other than that, I still didn't provide any insight on how to push from one rsyslog to another.[5] I'm also looking at latest Grafana and its new Explore workflow and how I could potentially use it to replace Kibana.

Sources:


  1. Yes, single container, single instance. ↩︎

  2. I'm looking at you Beats. ↩︎

  3. In general. ↩︎

  4. Feel free to ditch these lines should you have no use for them. ↩︎

  5. Even though I mentioned creating central syslog in two or three different posts already. ↩︎