SIEM Hyper Queries: introduction, current detection methods (part I/II)

The Splunk language is very powerful. I've been writing SPL for years and I still keep discovering new ways to use it, especially when browsing the docs or the community forums trying to solve another problem.

This year, I've published a query for detecting multiple flavors of password brute-force attacks using streamstats command. That query is leveraging some of the characteristics of what I am calling a Splunk Hyper Query.

So what's a Hyper Query?

Hyper- is a prefix from Greek meaning “over,” usually implying excess or exaggeration (hyperbole).

In a similar way, a SIEM hyper query (overly) performs multiple checks and iterations over the event stream before providing results back. The idea is to capture as many signals as possible from within one single query.

In this post I'm going to provide an overview of the thought process I use to craft those queries, which applies to many high-value detection use cases (suspicious command line sessions, suspicious web proxy sessions, etc).

Flag qualifiers

One command may carry multiple signals. For instance, it might be loading a suspicious DLL while also saving results using ADS. Therefore, we need a way to hold that information for each single event.

Here's an idea to solve that using SPL. An event (log record) is evaluated against multiple checks whose results are stored and accumulated over using eval's mvappend():

| eval qualifiers=if(match(command, "bad"), mvappend(qualifiers, "Bad was found"), qualifiers)

Those results are treated as qualifiers from which we are able to rate (score) how relevant or suspicious each record is. But how to set a score from there?

Setting a score

Unlike most procedural languages, in SPL we are not able to execute a code block or a set of instructions (commands) once a statement is true or false. We are only able to set the value of a field based on an IF/ELSE/CASE statement.

Also note that in SPL, every single command in a query is executed (evaluated). We can't prevent some commands from being evaluated.

As an example, here's how we would make it in some pseudo code:

if command has 'BAD':
qualifiers.append("Bad was found")
score.append(3)

Besides accumulating a qualifier, the score is also stored only if the statement evaluates to true. In SPL, we can try something like this:

| eval qualifiers=if(match(command, "bad"), mvappend(qualifiers, "Bad was found # score: 3"), qualifiers)

And later we extract the scores observed from each qualifier string. But of course, there are many other approaches to this, happy to discuss ideas!

Single Event x Sessions

With all qualifiers for each event, the idea now is to start evaluating a subset of events (session) and calculate their overall score. This way we can better determine how strong the signal is from a session perspective.

In the end, we don't want to bother the SOC with a distinct alert for every single suspicious command seen from the same host, we actually want to fire one single alert showing the bigger picture.

Now, we define what constitutes a "session". In SPL, one way to determine that is by using bin or bucket command.

Assuming we are inspecting commands executed in a host, we could split those commands into 1-minute sessions for each host with the following:

| bin span=1min _time AS session_id
| stats values(qualifiers) AS qualifiers by host, session_id

Assuming a detection query's lookback time is 1h, detection engineers could generate metrics related to each event sample (1min and 1h sessions).

For instance, the overall suspiciousness or risk from a host can be derived from how many distinct qualifiers were seen, what was the highest qualifier or session score and the overall score observed over the entire period (1h).

But hey, isn't that what Risk Based Alerting (RBA) is about? Not really, but before answering that, let's first get a quick summary of traditional detection strategies.

How are SIEM rules (detections) written today?

The vast majority goes by checking if a pattern exists in data, then "alert!" And some vendors dare saying that's a correlation. An easy example comes from how AVs traditionally check for EICAR pattern in files/streams.

This is the Pattern-based detection method, the cheapest one for sure. The eval's match() command from earlier becomes a pattern-based detection component within a Hyper Query.

While this is simple and sometimes effective (ex.: IOC scanners), that's by far the #1 detection methodology contributing towards high alert volume and other problems overwhelming SecOps teams.

Today, we have more elaborated methods for threat detection:

Behavioral based: as the name implies, instead of checking for a single, static value or pattern in data, we start checking for "negative matches" or known unexpected traces or behaviors.

For instance, "If anything other than process X attempts to access lsass.exe's memory, generate an alert"; or "If winword.exe spawns powershel.exe or wmic.exe, generate an alert".

Anomaly based: also known as UBA/UEBA or behavioral analytics based. The idea is to determine which scenarios are critical enough to be profiled and later compare a baseline to the current, observed state, using statistics.

Whenever the deviation is high enough, generate an alert. In very simple words, here's an example: "If admin A logs in at 3am and that deviates from the admin group's baseline on login times, then generate an alert".

For simplicity, let's consider in this category any outlier detection or any other method using statistics models (ML).

Some stats based detection were brilliantly described by David J. Bianco some years ago such as clustering and stack counting. These and other techniques are available from the OTRF repository.

The cost of these detection methods varies widely.

While the infamous "long tail" analysis or any other detection based on rarity (count of occurrences) is relatively cheap, properly doing baseline analysis, which sometimes implies having access to an inventory or local context, is definitely more expensive.

This is of course a more complex detection approach, with great potential once you know which scenarios to baseline, which model or method to apply, how to perform feature engineering, etc.

If you are not working with this now, you will soon! But we’ll leave this amazing subject for another blog. (hint: no MLTK needed!)

Risk Based Alerting (RBA)

You've probably heard about the term "Meta-Alert". That's basically alerting by consuming from alert events. Somehow that's what we do when we generate an alert out of an AV infection event consumed by a SIEM.

But here we are talking about generating a new alert when, for instance, there are more than 2 distinct (SIEM) detections firing from the same host within a period of time. That's what I refer to as a meta alert.

Now, to avoid firing alerts from weak signals, we turn single/atomic detection rules into observable or indicator events generators. Then, later leveraging one query (detector) to aggregate those signals and possibly generate an alert.

That's in summary, the concept behind RBA. Splunk's ES has being leveraging that with additional capabilities such as setting entity objects score (risk indicators) and other stuff for some years now.

If you want to know from the pioneers, consider getting in contact with Stuart McIntosh and his team at Outpost Security, those guys have a lot of experience when it comes to this detection strategy (they have a Slack room!).

RBA x Hyper Queries

In case you haven't realized, both these methods have something in common.

They consume from a list of aggregated indicators or qualifiers, split by entity (ex.: affected host) over a period of time and potentially alert once a stronger signal is observed.

While RBA is more like a framework, where the entities and risk indicators will apply to pretty much all detection use cases, a hyper query is designed with one detection use case in mind.

As always, there are pros and cons for each method and those are not mutually exclusive. I may explore this topic in a dedicated article soon.

The idea here is simply to share a new vision and inspire others to improve it and share their ideas too.

In summary

A SIEM hyper query (HQ) leverages some or all of the following:

  1. Multiple indicators or qualifiers are checked within a single query, those come from pattern, behavioral or anomaly based detection components;
  2. Besides evaluating individual events, a HQ also evaluates and rates event sessions broken by an entity (ex.: host);
  3. Each indicator carries a score which is later used to rate how suspicious or risky a scenario is, serving as the main metric to determine if a scenario is worth to be alerted on;
  4. Specific to SPL, stats/streamstats and mainly eventstats commands are used to calculate multiple metrics by using distinct split-by clauses.

In the second part of this blog series, I’m going to provide a concrete and easily consumable detection as an example and other detection use cases where I have been successfully leveraging such approach.

For now, this an introduction to Hyper Queries and how they stack up against other detection methods.

Blueteamer. Threat Detection Engineering & Research.