Threat Detection cost & value: a few lessons from the field.

Below is a question I started asking myself some years ago when I had realized I could write log-based detection content for a living:

How to determine detection value?

How could customers buy a “detection” if they cannot evaluate its value? Or how could I estimate value for a detection I design?

Just sharing some ideas around this interesting yet controversial topic.

Risk management, anyone?

Stepping outside detection controls for a moment, in vulnerability management that's a very known and controversial topic. What risk should be addressed over another? What patch should be applied asap?

The one providing the highest return (cost/value)? Probably yes, but how to calculate or even estimate that?

To be honest, I only got into those formulas when prepping for a CISSP exam more than a decade ago. Despite that, I've never seen a clear, consistent approach across organizations I've worked with, except for a very few cases (ICS/Military).

Nevertheless, there's one common scenario here: how leaders or decision makers perceive those risks. And that's when it becomes very subjective.

The way they perceive risk is determined by many factors, from their level of exposure to certain cyber topics (attack trends, threat intel, etc), to the news they consume, to the persons they interact with at work/social.

What about data-driven approaches applied to detection engineering practice?

If you are only looking for metrics, also consider the following reading:

"Threat detection metrics: exploring the true-positive spectrum"

Kenna Security is among a few companies I follow that leverages data-driven methodologies to better estimate value and execute in vulnerability management cycles. The approach is really interesting, worth checking.

Nevertheless, among a few counter arguments generally applicable to stats/ML based approaches, some say we need to have reliable data to back it up and make it work as expected.

Assuming we have enough of that, what happens when models suggest your administrators to prioritize another MS Office patch over a router bug recently reported by Cisco that affects a device in the external DMZ?

More context (input) needed? Are the models properly accounting for that? Are engineers properly feeding those models?

Besides having access and understanding external data, organizations must understand their own data, no model fits all. Again, I love the term Local Context suggested by Anton Chuvakin.

So collecting and making sense of that local/internal data is perhaps the main challenge regardless of the approach.

No asset inventory, no deal? How can an enterprise expect success using ML/AI when they can barely track high-privilege accounts or activity in their environments?

How to *estimate* value in detection?

There’s no easy formula. It’s a process with many inputs and depends on the consumer of the detection, sorry to disappoint you!

Finding the sweet spot you and your customer agree upon paves the way for establishing a consistent strategy in the long run.

Since customers may perceive risks differently, you need to ponder that. Similarly, a detection might be prioritized to cover an important gap identified very outside our common perception.

For instance, a key business process may rely on an exposed web server which requires long impact analysis cycles before a new security patch is approved and ready to be installed.

Therefore, detection controls play a very important role in this context.

Now, how critical is that scenario (exposed web server) when compared to the following — also requiring strong security monitoring?

  • A jump server or a bastion host (RDP, SSH) bridging test/development & production environments (more common than you think);

It depends. We all know the value of a Mercedes-Benz is higher than a VW. However, how much it costs to get there?

I believe the detection implementation cost must also be part of the equation. Below a few questions on that:

  • How much time and resources, that is effort, is needed to enable and store the necessary log telemetry?
  • How much time to craft a detection model to accurately detect a threat?
  • Is the SOC able to follow up and engage the right response once such specialized alert is triggered? If not, where's the value in the end?

As it applies to other areas of Cybersecurity, those answers might reveal other wek points, perhaps requiring even more strategic decisions.

What about MITRE's ATT&CK coverage?

Here we need to make a very clear distinction: data source coverage is very different from detection coverage.

The former helps a lot in estimating the cost (data onboarding/retention). But the actual cost should also consider the design, implementation and maintenance of detection code.

Stating which data sources cover a certain ATT&CK technique still does require a detection or a set of detections to be developed to ultimately detect anything, right?

There's limited value when video camera recording is used without proper motion sensors or watch guards to monitor it, correct?

Likewise, despite enabling post-mortem investigations and the potential development of detections, simply storing logs is far from enough.

The goal is to define an approach or process to build upon those logs and enable the continuous delivery of agreed and expected detection value.

Data Source detection potential as metric

I'm not going to start another SIEM rant, why it fails, etc. I just leave this here: starting log collection without considering detection use case design leads to massive data seating at your SIEM, completely untouched.

Forensics and Compliance are NOT detection use cases — despite some overlapping in terms of data sources and log telemetry needed.

If it's feasible in your org, determine how many detection use cases you have broken down by data source.

Tip: In Splunk, every query, including ad-hoc, user dispatched and scheduled ones are saved to the _audit index.

Try to estimate how much % of data is ever looked up. That's a strong figure to bring to the table. And don't be surprised if that accounts for 40–60%!

Why does it matter?

When introducing any value proposition, it must include the costs.

And that should consider data onboarding (infra or audit policy management) and maintenance costs (license, storage, admin overhead).

I couldn't approach this topic without referencing the following chart from amazing research done by brothers Jose and Roberto Rodriguez:

The importance of EDR/endpoint log telemetry for detecting ATT&CK's techniques (which threats leverage)

BAS: one (commodity) attack fits all?

Perhaps in one thing we can agree here: regardless of the industry, most if not all organizations might become targets of commodity malware and other non-APT (if that's a thing).

Also, referencing the concept of detection in depth, we should consider that an attacker will invariably leverage some common techniques (persistence, exfiltration), which will be more prone to detection.

Following this line of thinking, I believe it makes perfect sense to integrate BAS and attack automation solutions to detection engineering workflows, not only validating detection controls but driving the backlog priority.

These are definitely going to complement individual, tailored red team exercises while making detection tests and overall QA process repeatable.

Actually, that's a clear metric on estimating the value, isn't it? The challenge perhaps is shifted to which set of attacks to prioritize and perform?

More on that soon!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store