Despite being known to some, I am pretty sure the topic of this post will relate to many and perhaps even hurt a few.
Let’s start by addressing this well-known term, one of the main challenges for all SIEM or Log Management practitioners:
While there are many definitions most associate it with the process of following a standard for reducing records to common event attributes. That is, common field names and values.
In practice, while firewall X logs contains src and dst fields; firewall Y uses src_ip and dest_ip to store similar values. Also, while one refers to dropped packets as a ‘drop’; the other prints a ‘deny’ in its logs.
Now wonder how it goes for an AV or EDR log record, where there are processes (or commands?) sometimes containing over one hundred attributes per single log entry.
Events versus Logs — what’s the difference?
To fully understand my claim, it’s important to note the distinction between events and logs in this context. The idea is simple: a SIEM ingests or collects logs from other systems and turn those logs into events.
While an application developer owns the logging process of an app or a system, a vendor or its developers are the ones governing how a SIEM is going to process one or multiple logs as an event.
In the end, it's an ecosystem. There are many, many inputs and one output. Sounds simple and it seems logical to expect a log standard to fit there.
Reality hits you hard, Bro! (worth watching — again!)
Turns out, after data is available to be on-boarded, log format and normalization are the common, next road block in a SIEM project.
Below are just a few examples contributing towards that:
- Custom business apps are likely leveraging custom log format;
- Firewall X supports log format X, while firewall Y goes with Y log format;
- The old — and previously supported data input, has a new version, with new log format not recognized by the SIEM parser or extractor.
The normalization cake is a lie!
In enterprise or big environments, event normalization is never, ever going to be fully achieved. Period. I’ve been involved in many projects with solid processes and I am yet to see normalization efforts keep up with the amount of new, ever changing data sources needed to be processed by SIEM.
Simply face it: you can’t control or influence on what a 3rd party developer is going to leverage for his/her application or system logging. While some agree XML isn't the best log format, content-wise is a complete different story.
Truth is many already realized that years ago and simply gave up on SIEM (or are dying trying). But to provide a more positive thinking, let me put it a bit differently so that hopefully I will not lose you:
SIEM event normalization is utopia.
SIEM alert normalization is a must.
This becomes easier to understand once you assume logs turn into events, and events eventually turn into security alerts or indicators. More on that further down this post.
So do yourself a favor: balance the efforts and do not set normalization as a milestone or a requirement before providing value. This will help you setting expectations and establishing trust with your customers (internal/external).
Although event normalization is hard to tackle, it does not mean SIEMs are ineffective (or dead!). My point is that as long as data is available, regardless of its normalization state, you can already benefit from it.
The good news is technology is also evolving in this respect. Thus, I need to provide a more concrete example to demonstrate how we can get over this sort of obsession for normalized SIEM events.
SIEM as a development platform
I've been writing about this for a while and after spending a great deal of time on content development. The more you treat your SIEM as a development platform, embracing customization; the bigger the chances of success.
This mindset enables users to better plan and prioritize any changes needed when it comes to normalization and other data polishing efforts.
Ultimately, the minimum requirement is you need events (data), not normalized events (polished data).
One of Splunk's hidden gems
Today, many market SIEMs are still requiring a full blown parser to be written and maintained for properly extracting fields/values from an event.
And this is needed before storing and exposing data to be consumed by users. Requiring a parser before storing the event is a blocker. Not only that, until there’s a new parser available, new logs cannot be consumed at all!
In short, no parser, no game.
There are many cool SIEM products out there, but I will pick Splunk as an example since it's the platform I've been working on for years.
To cut through marketing terminology (schema-less, etc), here's how users can benefit from search-time field extraction, otherwise known as parsing:
- Throw JSON, CSV, XML or key=value pair based log stream at Splunk and it will automatically recognize fields and values. In case log content changes (e.g., new field is added), there's no need to change anything!
- Throw ANY data at Splunk and the user is still able to 'parse' or extract field/values at consumption or search-time (mainly using rex command). For instance, to extract the sender and recipient addresses from an email:
| rex field=_raw "From: (?<from>.*) To: (?<to>.*)"
Just to make it clear. Above benefits are available, regardless of following a normalization standard or not. Click here to read more about it from docs.
There are a few minor steps needed (time/timezones, file header) to perfect it, but that's about all you need. Data is readily available for consumption.
Just to make it pretty clear: that's a huge, huge step when compared to old SIEM products heavily reliant on log parsing. Some big names are actually part of this list but a few would realize this shortcoming during a PoC.
You may say 'But hey, it would be great to have all my firewall events following a standard scheme'. I agree, but this is not a requirement. Simply spend 5 minutes checking a sample data set (ex.: index=firewall | head 100), understand its context, find where 'src_ip' is, write your queries and carry on.
On a later stage, again, balancing the efforts (quick wins) giving the project goal (value), you may revisit normalization. For instance:
- By providing your search-time extractions to the Splunk admin or team responsible for data on-boarding to be implemented at server-level, rendering your rex command redundant or irrelevant;
- By encouraging, not enforcing, the app developer to properly follow normalization standards or guidelines. For instance, by pointing him/her to Splunk's Common Information Model (CIM).
In the end, I'm not saying you should ignore normalization, but you should balance its efforts with other actions within your project. Despite having a query containing a few rex/rename commands on it, the value is achieved.
Once you are able to tackle normalization again, the benefits are greater, I fully get it. Things like a cleaner and easier to maintain code/documentation and better event typing/tagging are among the benefits.
However, the lack of normalization should not prevent you from consuming the data and writing code (rules, dashboards, etc).
Now, since you own and have full control over the queries generating alerts (correlation or simple scheduled searches), you have no excuse to avoid following a standard and normalize all fields part of your alerts.
But why is that? Here are just a few of the benefits:
- SOC/analysts will easily understand the context around an alert;
- It's easier to write and maintain handling guidelines (playbooks);
- It's easier to integrate or push alerts to another platform, for instance, a case or incident management system;
- It's easier to write custom dashboards or reports consuming your alerts. In Splunk/ES, this is the data stored at the 'notable' index;
In practice, since you are likely leveraging SPL based analytics by stacking or aggregating values via stats/chart commands before raising an alert, it takes little effort to accomplish alert normalization.
Let's say you need to raise one alert whenever there are blocked connections towards the network 10.1.1.0/24, grouping alerts by source IP address.
A query prototype would look like the following:
index="firewall_xyz" dst_ipaddr="10.1.1.0/24" outcome="blocked"
| stats count AS note,
values(dst_ipaddr) AS dest_ip,
values(dst_port) AS dest_port,
values(protocol) AS proto
| rename src_ipaddr AS src_ip
| eval note="There were ".note." packets blocked!"
Despite having non-CIM compliant field names from the original events, the values() stats function along with rename command are complying with Splunk’s CIM by following the naming standard.
As a bonus, in case you use Splunk/ES, alerts from that rule would perfectly be rendered in the Incident Review page.
Happy to discuss further, feel free to contact me!