Can Data Science Identify Insider Threats?

September 1, 2020

According to a survey by Forrester Research in 2019, 52% of global enterprise network security decision-makers reported that their firms experienced at least one breach of sensitive data during the past 12 months. And nearly half the breaches of sensitive data came at the hands of internal actors, through either poor decisions or malicious intent. Security teams typically prepare for insider threats by monitoring and auditing access, hoping that if proactive detection fails, they’ll at least be able to do forensic analysis when an incident occurs.

Obviously, this approach rarely provides security teams with the lead time necessary to interdict damage before its done. The holy grail of resilience to insider threats involves the capacity to detect a threat even before it occurs, whose promise is evident both by private investment and research by the US Government. But the pathology of a malicious insider is very complex. An insider typically takes precautions to evade detection, so how could a software solution reliably identify what is a threat and what is not?

Recent technological advances have shown significant progress towards predicting or anticipating what was previously considered intractable—human preferences, dispositions, and maybe even behavior. Systems like Alexa, Siri, and Cortana even periodically appear to anticipate user’s needs before they’ve even vocalized them.

Lots and lots of data

On the backend of what appears to be behavioral predictions are vast amounts of data and the critical mass of computational resources necessary to action that data. Other similar examples include voice recognition and image analysis. At its core, these are applications of the discipline of data science. It turns out data science can be applied to insider threat in at least three of the following ways:

The first approach involves learning the known behaviors indicative of an insider threat. For example, exfiltration behaviors such as uploading data to a Dropbox account, extensive use of USB sticks or data transferred from sensitive locations or in large volumes. These indicators are specific enough to catch an ongoing attack but are limited to only those that are known ahead of time.

A second approach focuses on anomalies in observed behavior, looking for cases where there is some appreciable deviation from what is standard, normal, or expected. Knowing that insider threats are paired with behavioral changes in an individual means that this approach may uncover even early stages of a threat. Unfortunately, the number of false positives with this approach are tremendous. Benign changes in behavior—such as changes in job function or teams or returning to work after a vacation or just having an off day—can all trigger high volumes of anomalies. As such, for even modestly complex environments, anomalies are often more useful as trailing indicators supporting an investigation and not as leading indicators predicting behavior.

The third and more advanced approach generates narratives from the output of both the first and second approaches: Combining known indicators as a strong signal, with appropriately weighted inputs from the weaker, noisier anomaly-based approach. This approach provides stability from the known indicators available via domain expertise, while harnessing the full power of massive data sets driving anomaly-based approaches. And while in practice this is challenging and the specific mechanism to balance these approaches will evolve over time, the reality is that this approach continues to show promise and progress towards genuinely intelligent machines. The future is bright, unless you’re an insider threat!

It's National Insider Threat Awareness Month. If you'd like to learn how Vectra can help, you can schedule a demo.

Can Data Science Identify Insider Threats?

Lots and lots of data

FAQs