Near and Long-term Directions for Adversarial AI in Cybersecurity

September 12, 2018
Sohrob Kazerounian
Distinguished AI Researcher
Near and Long-term Directions for Adversarial AI in Cybersecurity

The frenetic pace at which artificial intelligence (AI) has advanced in the past few years has begun to have transformative effects across a wide variety of fields. Coupled with an increasingly (inter)-connected world in which cyberattacks occur with alarming frequency and scale, it is no wonder that the field of cybersecurity has now turned its eye to AI and machine learning (ML) in order to detect and defend against adversaries.

The use of AI in cybersecurity not only expands the scope of what a single security expert is able to monitor, but importantly, it also enables the discovery of attacks that would have otherwise been undetectable by a human. Just as it was nearly inevitable that AI would be used for defensive purposes, it is undeniable that AI systems will soon be put to use for attack purposes.

AI Artificial Intelligence explained - Part 8

We outline here the near and long-term trajectories these adversarial applications of AI are likely to take, given the history and state of AI.6

Immediate applications

There are a number of areas in which the development of AI, and deep learning in particular, has specific applications that can nevertheless be modified for malicious purposes by hackers.

For example, many state-of-the-art techniques for natural language processing make use of a form of recurrent neural network known as an LSTM in order to process, classify, generate and even translate natural language. An LSTM Language Model trained on a dataset of speech or text can be used to generate new sentences in the same voice or manner as the text it was trained from. This model that learned to generate tweets in Trump’s voice is one example of this use:

Models such as these can readily be leveraged by hackers as one tool of many in their arsenal. For example, various families of malware will make use of domain generation algorithms (DGAs) in order to randomly construct new domains as rendezvous points so that infected machines can reach out to a command-and-control server. If the domains were hardcoded, it would be trivial for a network administrator to simply blacklist malicious domains.

Because randomly generated domains (take as an example) look quite different from the kinds of domains that any human would register, it is relatively easy to create models that can detect normal vs. DGA domains. An LSTM model trained on normal domain names, however, could easily construct faked domains that look indistinguishable from what any human might pick.

Another class of models (which also often make use of LSTMs) is known as sequence-to-sequence (seq2seq) models. Seq2seq models, currently state-of-the-art in the field of translation, take as input a sequence in one domain or language (e.g., a sentence in English) and produce as output a sequence in another domain or language (e.g., a sentence in French).

These models can also be used, however, for a technique known as fuzzing, which automates the process of finding errors and security holes in code ( The security holes found by these techniques can often lead to buffer overflows, SQL injections, etc., that give attackers total control of a system.

A sequence-to-sequence RNN model to generate PDF objects

In general, the areas in which AI and ML are most immediately applicable happen to be limited in scope and work only in conjunction with a human attacker making use of the system. The application areas are likely to be limited to speeding up automation of various types of tasks (as in the fuzzing case) or to mimicking human performance and behavior (as in the DGA case).

Near-term applications

Excerpts of a well-formated PDF document.

As the state of AI develops in the next few years, techniques that have only recently been developed, such as generative adversarial networks (GANs), will begin to expand the scope of possibilities for attack.

Interestingly, GANs were first motivated by looking at adversarial attacks on existing deep learning methods: simple changes to inputs that would otherwise be indistinguishable to humans but would maximally confuse a neural network. Take the following example from Goodfellow et al., (2014) []:

The addition of a slight amount of noise to an image of a panda, results in an image of a panda that is indistinguishable from the original to most humans

The addition of a slight amount of noise to an image of a panda (left-hand side of the equation in the figure above), results in an image of a panda that is indistinguishable from the original to most humans (right-hand side of the image above). Nevertheless, this slight addition changes the prediction of a neural network that was trained to recognize objects in images from a “panda” to a “gibbon.” An example more recently was able to generate similar confusions with the change of just a single pixel (

One-pixel attacks created with the proposed algorithm that successfully fooled a target DNN.

These sorts of attacks will become more prevalent as AI and ML find their way into our daily lives. Deep neural networks of the sort that were attacked in the examples above are the core of the vision systems that govern driverless cars, facial recognition (think of the cameras when you go through border security when entering back into the United States) and more.

The kinds of adversarial attacks shown above will increasingly be used for malicious behavior as more and more systems rely on automated AI solutions.

GANs, which were originally motivated by adversarial attacks, are also interesting in their own right. GANs are coupled neural networks with competition between a generator network, whose job it is to generate some output, and a discriminator network, whose job it is to determine if the input it sees was generated by the generator or drawn from a real dataset.

Amazingly, the game-theoretic back and forth that plays out between the networks results in a generator that can produce stunningly realistic outputs. This is particularly true in the domain of images, where GANs have begun to create images that look hyper-real (e.g., faces of celebrities that don’t exist:, but it is also now being used to generate natural language.

These models will be able to generate realistic human speech in another person’s voice or code to achieve a particular goal or task. They will likely begin to be used to fool systems and humans by generating outputs that are indistinguishable from real ones.

1024 x 1024 images generated using the CELEBA-HQ dataset.

Long-term directions

In the long run, we expect the use of AI in adversarial or malicious settings to shift increasingly towards the field of reinforcement learning (RL). Unlike the models discussed so far, RL enables an AI agent to not only process inputs but, moreover, to make decisions in response to those inputs in a manner that may affect the environment itself.

Without the ability to make decisions and act, an AI agent is effectively only capable of input processing

The ability to observe an environment or input state and then take action in response to it closes what Jean Piaget referred to as the “action-perception loop” in humans. Without the ability to make decisions and act, an AI agent is effectively only capable of input processing. RL is what made modern game-playing AIs (e.g., and what led to AI systems beating the world’s best Go players (

In essence, RL functions by giving an agent a positive reward when it achieves some goal and giving it a negative reward when it fails. The rewards should then increase the likelihood of taking responsive actions that are likely to lead to positive rewards, while inhibiting actions that are likely to lead to negative rewards.

Some form of RL (or related methods that develop from it) will be required in order to create AI agents that can autonomously recon, target and attack a network. Indeed, at Vectra, we have already constructed rudimentary agents that can learn to scan networks in a manner that evades detection systems. These systems were trained by rewarding the agents for the information collected, while punishing them every time they were caught.

This sort of training is not simple, however, since there are no clear methods for defining what an environment is and what the space of possible actions are (unlike things like Atari games or even a notoriously difficult game like Go, in which the state space and action space are relatively clear).

There is even now a project attempting to use the Metasploit API in order to create a series of states and actions which are easily ingestible by RL algorithms, which can then be used by algorithms developed in TensorFlow. The project, called DeepExploit, was presented at Black Hat 2018(

Ultimately, it is this last category of AI for malicious or attack behaviors that has historically captured the imaginations of sci-fi writers and the public at large. But long before these kinds of agents come to be, AI and ML will be used for a wide variety of attacks – some of which we can already predict in the pipeline and others of which we simply won’t know until they happen.

Godefroid, P., Peleg, H., & Singh, R. (2017, October). Learn&fuzz: Machine learning for input fuzzing. InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering(pp. 50-59). IEEE Press.

Goodfellow, I. J., Shlens, J., & Szegedy, C. Explaining and harnessing adversarial examples (2014).arXiv preprint arXiv:1412.6572.

Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation.arXiv preprint arXiv:1710.10196.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Chen, Y. (2017). Mastering the game of Go without human knowledge.Nature,550(7676), 354.

Su, J., Vargas, D. V., & Kouichi, S. (2017). One pixel attack for fooling deep neural networks.arXiv preprint arXiv:1710.08864.