Grok patterns for Logstash: how to write.

keep-calm-and-grok-on-600x410

One of the most important Ops goals – check systems healthy. And there is no easier way to do it than logs file reading. You can do it directly from system terminal, filtering with “grep”, “sed”, “awk”. But it’s easy to confuse and lost track of real problem. Here is your personal help – log management systems.

This article about log records filtering by “grok” pattern engine. IHere will be samples of grok patterns and logstash filters. But at first you have to realise a whole log processing moment.

What is the grok?


Grok is a pattern engine for logs parsing by special template. Basically log records have the same format for a special kind of software. The “grok” idea – present well-built log records structure for further automatic processing.

What is the Logstash?


Logstash is a log processing software. It allows to perform operations with any log file format. Generally Logstash is a bunch of filters for each format. With “grok” patterns you can set filters with special settings like time tracking, geoip etc..

Generally the whole log management server is constituted by:

  1. Filebeat on the nodes. It’s going to ship logs to the server
  2. Logstash which is connected with nodes by Filebeat through SSL certificate. The main idea of Logstash described above. Besides I’d like to set a point for input and output logstash configuration. There you need to choose interconnected software. For input it’s basically Filebeat, for output – Elasticsearch.
  3. Elasticsearch as a document-oriented NoSQL database. You can use it for myriad of purposes more. In this structure Elasticsearch receives JSON-formatted logs and stores it. You don’t really need any extra configuration if you’ve got all management tools on one host.
  4. Kibana as a frontend application for logging visualisation. Provides a web-browser access to Elasticsearch log records storage. With Kibana you’re able to discover logs, filter them as easy as it could be, build graphs for any parameter. So on, so forth.

Here is the clear ELK description and installation instruction. I just want to shotly point out what is the role of grok patterns at this scheme.

I’ve got JSON-formatted logs. So what?


Now you don’t need anymore to scroll log files manually and stop on any strange record. It’s automatized, visualized and cleared. Here is the “grok” sake too.

Pattern syntax


The grok patterns syntax is simple. Each field is started from “%” symbol and enclosed in braces. Fields could be assigned one by one or be splitted by a special delimiter. Delimiter could be a whitespace, semicolon, colon, whatever. Each field has 2 parts splitted by colon.

The first part is a field type. It could be integer, word, IP-address or anything else. Here is the basic log field formats. If you have a basic regexp knowledge, you can easily pick up a suitable type. But also you can built your own log fields in pattern for special needs. Just declare it in Logstash filter file and assign it in the main pattern below. Of course, field can include any other fields.

The second part is the field name. In JSON key-value format it’s going to be a key. Elasticsearch indexes records by this name, Kibana shows them as a table columns. Names make discovering clearer. You started to recognize what is this part of message for.

Write your first pattern


The goal – structurize Nginx logging. The format of combined Nginx log record has the next view:

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1" 200 2 "http://www.example.com/wordpress3/wp-admin/post-new.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3"

At first we have to realise how is the fields splitted and what does each field stand for. Delimiters are not equal. Some fields divided by whitespace, but some fields have whitespaces included. Also fields isolated by quoted useless for log view. About fields destination:

123.65.150.10 Client IP-address
Authentified user ID (it this case it’s blank)
Authentified user name (it this case it’s blank)
23/Aug/2010:03:50:59 Access date and time
+0000 Server timezone (here is UTC)
“POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1” HTTP request (POST format and desired page) with protocol version
200 Response request status
2 Size of response (in bytes)
http://www.example.com/wordpress3/wp-admin/post-new.php Referrer URL for Web-client access
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3 User agent description

Now we want to structurize this record and cut off useless parts. Let’s make a new file beforehand. Patterns directory described in Logstash filter configuration will be a layout. Usually it’s a /etc/logstash/patterns. After that write a personal pattern for Nginx log records.

NGINX %{IPORHOST:nginx_clientip} (?:%{USER:nginx_user_ident}|-) (?:%{USER:nginx_user_auth}|-) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:nginx_http_request} %{URIPATHPARAM:nginx_request_desc}(?: HTTP/%{NUMBER:nginx_http_version})?|-)" %{NUMBER:nginx_response} (?:%{NUMBER:nginx_bytes}|-) "(?:%{URI:nginx_referrer}|-)" %{GREEDYDATA:user_agent}

Explaining notes:

  1. Brackets in \[%{HTTPDATE:timestamp}\] screened by slashes. It’s a regexp rule. Brackets won’t be delivered to Elasticsearch as every symbol between fields. It makes log records more readable.
  2. Construction (?:%{USER:nginx_user_auth}|-) means a condition. If field is not equal “-”, then it will be delivered as a nginx_user_auth field. Otherwise it won’t.
  3. Construction HTTP/%{NUMBER:nginx_http_version})?|-) extracts only number of HTTP protocol version for shipping. “HTTP” word will be left (because all you know that it’s a HTTP). Generally I suggest to skip version too, because it’s almost a 1.1. 2.0 version isn’t coming soon.

The next step is assign this pattern in Logstash filter file. Create a new file in /etc/logstash/conf.d or append the filter in existed file. Now let’s choose the first way. Config file should have a .conf extension and unified number. So, let’s make a 10-nginx-filter.conf file and append this lines:

filter {
      if [type] == "nginx" {
          grok {
              patterns_dir => "/etc/logstash/patterns"
              match => { "message" => "%{NGINX}" }
              named_captures_only => true
            }
      date {
              match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
           }
      geoip {
              source => "nginx_clientip"
          }
     }
}

Explaining notes:

  1. Type of log is going to be checked beforehand after Filebeat input. You can set a type in Filebeat config file for each node.
  2. Patterns directory is a layout of your pattern. Now it’s default – /etc/logstash/patterns.
  3. named_captures _only option permits only named fileds for further output.
  4. Match option points out a pattern laying on record.
  5. Date directive allows to choose a field for time tracking. Here you can reformat this field to general logstash timestamp format.
  6. Geoip is a one of Logstash features. You can set client IP-address field and track its location. Additional geoip fields like city, state, country will be put in JSON automatically. With geoip you may pick up the records by client location and determine the reason of server danger. Client could be a source of spamming, snooping etc.

Now you can easily sort out your logs and shoot your trouble carelessly. Have fun, but remember the Java memory features, included in Logstash 🙂

Bonus: here is my Github repo with Logstash filters and Grok patterns. Welcome, if you’re interested in.

25 thoughts on “Grok patterns for Logstash: how to write.

  1. Pingback: Infrastructure from scratch. Part 4: log management | MyOpsBlog
  2. Thank you for the sensible critique. Me and my neighbor were just preparing to do a little research about this.
    We got a grab a book from our local library but I think
    I learned more clear from this post. I am very glad to see such great information being shared freely
    out there.

    Like

  3. Уou actually makе it seem ѕo easy with your presentation bսt I
    find this matter to be realⅼy sоmething wһich I think I woᥙld never understand.
    It seems too complicated and extremely broad fоr mе. I’m ⅼooking forward fоr your next post, I will try to get tһе hang of it!

    Like

  4. I’m no longer positive the place you’re getting your information,
    however good topic. I needs to spend a while studying more or working
    out more. Thank you for magnificent info I used to be in search of this
    info for my mission.

    Like

  5. You really mɑke it seem sⲟ easy with your presentation but I fіnd this topic to be really something that I think I would never understand.

    It ѕeems tⲟo complicated and veгy broad for me.
    I’m loⲟking forward for your next post, I’ll try to gеt
    the hang of іt!

    Like

  6. Please let me know if you’гe looking fοr a author for your bⅼog.
    You have some really great postѕ and I tһink I would be a good ɑsset.
    If you ever want to takе some of the load off,
    I’d love to write some articleѕ for your bloց in exchange
    for a link back to mine. Please send me an еmail if interested.
    Many thankѕ!

    Like

  7. Hi, I do think tһіs is a great blog. I stumbledupon іt 😉 Ι wilⅼ return ߋnce aցain since i have book marked
    it. Money аnd freedom іs the greatest waу to change, mɑy үoᥙ
    ƅe rich and continue tο help ߋther people.

    Like

  8. Greate article. Keеⲣ posting such kind ᧐f info on your page.

    Im realⅼy impressed ƅy ʏour site.
    Hey theгe, Yοu’vе ɗоne an incredible job. І wilⅼ сertainly digg it and in mʏ opinion recommend tօ my friends.
    Ι’m confident they wіll bе benefited from this website.

    Like

  9. Hi there, just bеcаmе alert to your blog tһrough Google,
    and found that it’s гeally informative. І’m gonna watch ᧐ut for brussels.
    I’ll be grateful if you continue thiѕ in future. A lot of people ѡill be benefited from үoսr writing.
    Cheers!

    Like

  10. Ԍreat post. І was checking continuously tһis weblog and Ι’m inspired!
    Vеry սseful іnformation specіfically thе remaining phase 🙂 I deal ᴡith such informatiοn much.
    I ᥙsed tօ be seeking thiѕ pɑrticular info fⲟr a vеry lengthy time.
    Thankѕ and gߋod luck.

    Like

  11. Thanks on yоur marvelous posting! Ӏ гeally enjoyed
    reading it, you happen to be a great author.I will
    be sure to bookmark youг blog and ѡill come Ьack in the future.
    Ӏ want to encourage you to ultimately continue үour great posts,
    haѵe a nice weekend!

    Like

  12. Hi therе thiѕ іs somewhɑt of off topic bսt I ѡas wondering if blogs usе WYSIWYG editors or if
    you havе to manually code ѡith HTML. I’m starting a blog ѕoon ƅut have no coding know-hoԝ so I wanted to get guidance fгom ѕomeone ѡith
    experience. Αny helр ԝould be enormously appreciated!

    Like

Leave a reply to Jual Gorden Jurang Mangu Cancel reply