Deep learning refers to a family of machine learning algorithms that can be used for supervised, unsupervised and reinforcement learning. These algorithms are becoming popular after many years in the wilderness. The name comes from the realization that the addition of increasing numbers of layers typically in a neural network enables a model to learn increasingly complex representations of the data. Inspired by the biological structure and function of neurons in the brain, deep learning relies on large, interconnected networks of artificial neurons.
Artificial neural network models are inspired by the biological circuitry that makes up the human brain. Artificial neural networks were first created to show how biological circuitry could compute or determine the truth values of propositional statements in first-order logic. Neural networks learn relevant features from a data set and build increasingly complex representations of these features as data flows into higher network layers.
Warren S. McCullough and Walter Pitts showed how to construct a series of logic gates that could compute the binary truth values. The neurons in their model are individual units that integrate activity from other neurons. Each connection between neurons is weighted to simulate synaptic efficacy – the ability of a presynaptic neuron to activate a post-synaptic neuron.
Even though they considered the temporal sequence of processing and wanted to include feedback loops, their models were unable to learn or adapt.
Although various learning rules exist to train a neural network, at its most basic the learning can be thought of as follows: A neural network is presented with some input, and activity propagates throughout its series of interconnected neurons until reaching a set of output neurons.
These output neurons determine the kind of prediction the network makes. For example, to recognize handwritten digits, we could have 10 output neurons in the network, one for each of the digits between 0-9. The neuron with the highest activity in response to an image of a digit denotes the prediction of which digit was seen.
At first, the weights between the neurons are set to random values, and the first predictions about which digit is in an image will be random. As each image is presented, the weights can be adjusted so that it will be more likely to output the correct answer the next time it sees a similar image.
By adjusting the weights in this manner, a neural network can learn which features and representations are relevant for correctly predicting the class of the image, rather than requiring this knowledge to be predetermined by hand.
While any procedure for updating the weights in this manner can suffice – for example, biological learning laws, evolutionary algorithms and simulated annealing – the primary method used today is known as backpropagation.
The backprop algorithm, discovered several times by different researchers from the 1960s onward, effectively applies the chain rule to mathematically derive how the output of a network changes with respect to changes in its weights. This allows a network to adapt its weights according to a weight update rule based on gradient descent.
Despite the rules being in place for neural networks to operate and learn effectively, a few more mathematical tricks were required to really push deep learning to state-of-the-art levels.
One of the things that made learning in neural networks difficult, especially in deep or multilayered networks, was mathematically described by Sepp Hochreiter in 1991. This problem was referred to as the vanishing gradient problem, with a dual issue now referred to as the exploding weight problem.
Hochreiter’s analysis motivated the development of a class of recurrent neural network (RNN) known as a long short-term memory (LSTM) model, which is deep in time, rather than deep in space. LSTMs overcame many difficulties faced by RNNs, and today remain among the state-of-the-art for modeling temporal or sequential data.
Parallel developments for feedforward and convolutional neural networks would similarly advance their ability to outperform traditional machine learning techniques across a wide range of tasks.
In addition to hardware advances like the proliferation of graphics processing units (GPUs) and the ever increasing availability of data, smarter weight initializations, novel activation functions, and better regularization methods have all helped neural networks function as well as they now do.
Vectra AI uses deep learning and artificial neural networks to build complex representations of learned, relevant features from a data set. Since these representations are learned, rather than predetermined by data scientists, they are extremely powerful when used to solve highly complex problems. Learn more about Vectra AI’s application of deep and machine learning.
Sohrob Kazerounian is the senior data scientist at Vectra AI. Sohrob is highly experienced in artificial intelligence, deep learning, recurrent neural networks, and machine learning. Before joining Vectra, he did machine learning and artificial intelligence work for SportsManias. Prior to SportsManias, he was a postdoctoral research fellow at IDSIA. He received a B.S. in cognitive science as well as computer science and engineering from The University of Connecticut and a doctor of philosophy (Ph.D.) in cognitive neural systems from Boston University.