Anomaly detection refers to the process of identifying unusual items, events, or observations. Those items raise suspicion by differing from the normal and expected behavior. Anomalous data can indicate critical incidents. In the context of cyber threat intelligence, anomaly detection involves identifying potential malicious activities such as intrusion attacks, password spraying attacks, data exfiltration, among others. The basic assumptions for anomaly detection are that the anomalies, or outliers, occur rarely in the data, and they are significantly different from the expected pattern in the context being considered.
What is an anomaly?
Anomalies are patterns in the data that do not conform to a well-defined notion of normal behavior. They can be broadly classified into three categories:
- Point anomalies: The simplest and most common type of anomaly, it refers to an individual item that is anomalous with respect to the rest of the data.
- Contextual anomalies: This type of anomaly means that an item is anomalous in a specific context. The anomalous behavior is determined using the values of the behavioral attributes within a specific context. This means that the same value that might be considered an anomaly can be considered a normal item in a different context.
- Collective anomalies: In this type of anomaly, an individual item in a specific data subset may not be considered anomalous per se, however, the joint observation of all items in the subset is considered anomalous.
Anomaly detection techniques
In general, anomaly detection techniques can operate in one of the following modes:
- Supervised anomaly detection: These techniques assume that labeled training data is available. A typical approach in such cases is to build a predictive model for normal vs anomaly classes.
- Semi-supervised anomaly detection: Semi-supervised techniques assume that only part of the data for training is available. In general, only normal class data is available.
- Unsupervised anomaly detection: This is the most applied type of anomaly detection algorithms. It means that it does not use any training data. It works under the assumption that the majority of instances in the data set are normal by looking for instances that seems to fit least to the remainder of the data set.
Some of the approaches used to detect anomalies involve multivariate statistical analysis, such as performing dimensionality reduction using Principal Component Analysis (PCA), or the use of machine learning techniques.
Among machine learning anomaly detection approaches, some may use classification algorithms (supervised technique) such as Support Vector Machine, clustering algorithms (unsupervised technique) like K-Means, DBSCAN, among others, or even use artificial neural networks, such as autoencoder networks.
Challenge with Anomaly Detection
Although, at an abstract level, this might look like a simple task, when it is faced in practice, there are several challenges. The first one comes from the need to define normal behavior in the context being evaluated. Specifying every possible normal action is very hard, and it is also important to consider that the boundaries between normal and anomalous behavior are often not precise. In addition, normal patterns can evolve over time, making it even more difficult to clearly determine what behaviors are expected.
When anomalies come from malicious actions, adversaries adapt themselves trying to make their malicious behaviors look normal, increasing the complexity of differentiating their behavior from the normal ones. Another important challenge comes from the fact that the notion of anomaly can vary for different application domains, which means that a technique that works for some domain might not be enough for a different one.
Given this set of challenges, the anomaly detection problem is not an easy task to solve. Several factors influence the problem to be assessed, such as the nature of the data, types of anomalies to be identified, availability of labeled training data, and so on. Therefore, it is important to search for interdisciplinary approaches involving, for example, statistics, machine learning, data mining, information theory, among others, to find the best solutions to this challenging problem.
Anomaly Detection at the Data Layer
Cyral’s platform makes it easy for engineering teams to observe, protect, and control data endpoints in a cloud and DevOps-first world. By generating logs, traces and metrics for all data activity, Cyral is able to help detect anomalies and threats at the data layer. To learn more about how Cyral uses anomaly detection to protect its customers from data breaches, register for a demo.