Anomaly-based detection workflow: leveraging the Novelty component using EDR log telemetry

This is a more technical post to exemplify how my workflow goes when designing and implementing a new detection while serving as follow-up from a previous post The role of ‘Novelty’ and ‘Behaviour’ in Computer Forensics & Detection Engineering.

When you start following security researchers and their work, it’s truly fascinating the amount of detection use cases that can be derived from there.

Let’s take as an example the great article written by John Dwyer (IBM X-Force) in which he highlights many detection aspects around another very common technique leveraged by attackers: DLL Side-Loading.

There are of course multiple detection opportunities for monitoring such technique, some of them only feasible at the agent-level.

That is, when defenders rely on a custom rule leveraging the EDR product, or when file or memory payloads are available for manipulation (ex.: Yara).

But as always, here I focus on log-based telemetry monitoring.

So what is left for those relying on raw command line logs?

Just like anything else in Cyber Defense, we need to think about multiple layers, and that’s when Detection In Depth (DiD) comes into play.

When you combine it with the Locard’s principle, it makes perfect sense to expect an operation, no matter how stealth it is, to fall for one of our traps.

Detection Use Case

What caught my attention from that article was the following part:

Instead, threat actors or malware that have leveraged DLL side-loading commonly rely on two behaviors prior to executing an attack:

1) Plant a signed executable in a target directory along with the malicious DLL.

2) Move a Windows executable from System32 or SysWow64 on the target machine to a non-standard directory and plant the malicious DLL within the same folder.

The author later provides some hints on how to detect point #2:

Excerpt from the article above
Excerpt from the article above
Excerpt from article mentioned above

That’s an awesome idea! Thanks for sharing, John!

But how hard it is to implement it? First, let me show you I was half-way there.

Besides auto-submitting lists of hashes based on some scenarios, I often try to integrate VT checks in some Splunk Notable landing dashboards.

Those are basically customized triage panels for analysts where much more context can be easily added. And here’s an example:

Triage Panel

That’s a panel showing a list of hashes observed as part of an alert and their results when checked against VT via API. Not a big deal! But I wanna call your attention to the Prevalence column.

What’s prevalence in this context?

Prevalence = how many hosts are seen executing a file with a given hash, in a given time period.

And here’s how hard it is to build such baseline in SPL:

index=endpoint_telemetry ChildHash=*
| stats dc(endpoint_host) AS host_count BY ChildHash
| outputlookup childhash_baseline.csv

So when you read you need fancy stats/ML to build baselines, that’s roughly all you need to do, trust me! If you still want to go fancy, call it Frequency Analysis.

Now, depending on how much data you have, how many endpoints are being monitored and perhaps more importantly, if we are talking about a massive multi-tenant environment; you need to optimize it and make it scalable — which is another big challenge on its own!

For the metric freaks out there, in one of the environments I work, I see the following ratio when generating such baseline from child hashes executed:

# of endpoints * 1.7 = # of unique hashes seen over a month

This ratio is holding up quite well as more (Windows) endpoints are added.

Why is that prevalence metric important? Because that’s what determines the threshold for auto-checking a hash against VT. What’s the right number? It depends on your risk appetite (and your API call quota!).

Hint 1: use lookups to enable local caching, preventing quota issues.

How to track System hashes outside System folders?

It takes a few more steps, but that’s relatively simple. Just add the following to the previous SPL and you there you have it:

index=endpoint_telemetry ChildHash=*
| eval ChildPath=lower(trim(ChildPath))
| eval system_path=if(match(ChildPath, "^system(x86)*\|"), ChildPath, null())
| rex field=system_path "(?<system_exe>[^\\\\]+\.[^\.]{3})$"
| stats dc(endpoint_host) AS host_count,
values(system_path) AS system_path,
values(system_exe) AS system_exe
BY ChildHash
| outputlookup output_format=splunk_mv_csv childhash_baseline.csv

Note the output_format here (not default!). Those multi-value (mv) fields will be later compared against current, observed values.

And here’s an output sample:

Child hash baseline

The system_path regex used above is for a specific EDR telemetry (Panda/Watchguard) but can be easily modified to reflect any other endpoint telemetry data source such as Sysmon (ex.: EID 1).

Hint 2: You can expand the concept to other relevant binaries. For instance, not every known LOLBin is located in a system folder by default (ex.: dotnet.exe).

In the end, we have the following attributes available:

  1. A hash value from all binaries observed as a child process; And, in case a hash in seen within a system folder:
  2. All the full paths seen for each child process;
  3. The child executable name.

Those will make it easier for us to craft a detection for known hashes being executed outside not only the system folder, but not matching the baseline.

In my case, since I am leveraging a Splunk Hyper Query, it only takes a few lines to add a few more indicators:

exec_hash_stats.csv is an instance of childhash_baseline.csv shown previously

The basis for the detection is below:

``` after extracting the proper fileds from telemetry ```
| lookup local=1 childhash_baseline.csv ChildHash OUTPUT system_path AS known_system_paths, system_exe AS known_system_exe
| eval indicator=if(isnotnull(known_system_paths) AND ChildPath!=known_system_paths, mvappend(indicator, "New location"), qualifiers)
| eval indicator=if(isnotnull(known_system_exe) AND child_exe!=known_system_exe, mvappend(indicator, "New name"), qualifiers)

Final Considerations

This is another example of a behavior that might be part of different techniques. That means monitoring for a system hash outside system directories might trigger in many scenarios, not only DLL Side Loading.

So spot the behavior, then link to, or even suggest a new technique!

Part of the challenge when designing such detections is knowing what are the optimal thresholds for your (customer) environment.

  • How many days back should I consider for profiling the activity?
  • What's the interval for updating the baseline? Daily? Every Nth hour?

During development, the main FPs came from java.exe and javaws.exe executables which are basically seen everywhere, in system folders and outside them. Of course, each environment generates its own cases.

FPs — Java is omnipresent!

The good news is that depending on how the team deals with “anomaly-based” alerts relying on rare occurrence (novelty), they require low-effort/touch when it comes to exception handling.

That can be good and bad as it introduces a new risk.

Depending on the implementation, something new today will become old tomorrow. That is, it will be incorporated into the baseline, therefore not alert-able anymore. So the triage process here is even more important.

You should consider a reset mechanism so that in case something really rare happens today and we still want to keep monitoring for it tomorrow, we should be able to avoid it from entering (or contaminating) the baseline.

So a note for Product Management teams out there: Baseline Management interface = Exception Policy Editor. Got it?

How many vendors think about that (UX) when designing their products? Feel free to drop me a note on that!

Another option is, once a rare indicator leads to a confirmed threat, turn that into another static, atomic behavior check.

Happy hunting!

Blueteamer. Threat Detection Engineering & Research.