How Algorithms Learn and Adapt by Sohrob Kazerounian

There are numerous techniques for creating algorithms that are capable of learning and adapting over time. Broadly speaking, we can organize these algorithms into one of three categories—supervised, unsupervised, and reinforcement learning.

Supervised learning refers to situations in which each instance of input data is accompanied by a desired or target value for that input. When the target values are a set of finite discrete categories, the learning task is often known as a classification problem. When the targets are one or more continuous variables, the task is called regression.

For example, classification tasks may include predicting whether a given email is spam or ham (i.e., not-spam) or which of N families of malware a given binary file should be categorized as. A regression task might include trying to predict how many security incidents a host on a network will give rise to during a specific time. In both cases, the goal of supervised learning is to learn a functional mapping between the input space the data is drawn from to a desired or target space that describes the data.

In contrast, unsupervised learning refers to scenarios in which an algorithm or agent must learn from raw data alone, without any feedback or supervision in the form of target values. This often means learning to group together similar examples in the data—a task known as clustering—or learning something about the underlying distributions in the input space from which the data are drawn.

For example, clustering can be used to determine groups of machines in a network that are similar based on features like the number of internal vs. external hosts that they initiate connections to and the numbers of hosts that initiate connections with them.

Alternatively, unsupervised methods can be used for anomaly detection by learning about the properties and statistics of normal traffic on a network, so that network connections that deviate too far from the norm can be labeled as anomalous. While it may be difficult to enumerate all the possible use-cases, the goal of unsupervised learning is to learn something about the underlying data without the use of predetermined labels that describe it.

One of the historical developments that gave rise to the popularity of deep learning was unsupervised pretraining. It was used to learn an initial set of weights in a neural network before using supervised methods to fine-tune the learning for classes of images that the network was trying to categorize. Some good examples of unsupervised pretraining can be found in Fukushima’s 1975 neocognitron model and Hinton and Salukhutdinov’s deep Botlzmann machine.

Finally, reinforcement learning refers to learning paradigms under which an agent, interacting with an environment, takes a sequence of actions. Each action might change the state of the environment, and rather than receiving explicit feedback about the correct action to take at each step, it only receives general reward signals about how well it is performing.

Today, prototypical examples of reinforcement learning are found in video games, where an agent observes the state of the environment. In chess this might be the state of the board or in a Nintendo game it may be the set of pixels on screen. Each time step decides on an appropriate action, such as which piece to move next or whether to move left, right, up, down, or press a/b.

The agent doesn’t receive explicit feedback about the best action to take at each step. Instead, it gets feedback through a reward signal that comes potentially quite some time after the action is taken. For example, it can be a positive reward for winning a chess game or beating a level, or alternatively a negative reward for every point scored by an opponent in a game such as Pong.

Using the reward signal alone, the agent should attempt to increase the likelihood of taking actions that lead to positive rewards and decrease the likelihood of taking actions that lead to negative or punishing rewards. Reinforcement learning is likely to become an increasingly important aspect of automated pen-testing and intrusion detection. But it remains in a relatively nascent state with respect to the cybersecurity domain and is unlikely to perform well in production scenarios.

That said, the AI team at Vectra has already begun researching and developing reinforcement learning algorithms to create agents that learn attack behaviors that can evade IDS systems, thereby enabling the automated creation of data from which an IDS system can learn.