Neural networks are powering a wide range of deep learning applications in different industries with use cases such as natural language processing (NLP), computer vision and drug discovery. There are different types of neural networks for different applications such as:
- Feedforward neural networks
- Convolutional neural networks (CNNs)
- Recurrent neural networks (RNNs)
In this article, we will explore RNNs and their use cases.
What are recurrent neural networks (RNNs)?
Recurrent neural networks (RNNs) are a class of artificial neural networks that takes the output from previous steps as input to the current step. In this sense, RNNs have a “memory” of what has been calculated before. This makes these algorithms fit for sequential problems such as natural language processing (NLP), speech recognition, or time series analysis where current observations depend on previous ones.
What is the difference between RNNs and other neural network algorithms?
RNNs differ from feedforward and convolutional neural networks (CNNs) with their temporal dimension. In other types of neural network algorithms, inputs and outputs of the model are assumed to be independent of each other. In RNNs, the output depends on previous elements.
Suppose you have a speech recognition problem containing the sentence “What time is it?”. The deployed algorithm in this problem needs to account for the specific sequence of words for the output to make sense. As illustrated below, the RNN predicts the next word in the sentence by using previous words as inputs.
Since inputs and outputs are independent of each other in other types of neural networks, they are more appropriate for problems that do not have a sequential property such as image recognition or tabular data analysis.
How do RNNs work?
The image below demonstrates the basic structure of an RNN. The diagram on the right is the full (or unfolded) version of the diagram on the left.
- The bottom layer x is the input layer. The model inputs are denoted with x(t) where t is the time step. x(t) can be a word and its place in a sentence or the price of a stock on a specific day.
- The middle layer h consists of multiple hidden layers with their own activation functions. h(t) denotes the hidden state of the network at time step t. Hidden states act as “memory” of the model and they are calculated based on the current input x(t) and previous state h(t-1).
- The top layer o is the output layer. o(t) represents the output of the model at time step t. The current output is determined by current input, x(t), and the current hidden state, h(t), which depends on previous hidden states. This is the distinguishing feature of RNNs since current output depends on both current input and previous inputs.
- Parameters (U, V, W) represent the weights between inputs, hidden states, and outputs. They control the extent of influence between these.
For more, you can check our article on how regular neural networks work. RNNs are an extension of these regular neural networks.
What are the use cases and applications of RNNs?
RNNs and their variants LSTMs and GRUs are used in problems where the input data is sequential by nature. Applications with sequential data include:
- Time series analysis such as stock price forecasting
- Machine translation
- Speech recognition. Google’s voice search uses LSTM.
- Image captioning
- Sentiment analysis
What are the challenges with RNNs?
Recurrent neural networks suffer from a problem called vanishing gradient, which is also a common problem for other neural network algorithms. The vanishing gradient problem is the result of an algorithm called backpropagation that allows neural networks to optimize the learning process.
In short, the neural network model compares the difference between its output and the desired output and feeds this information back to the network to adjust parameters such as weights using a value called gradient. A bigger gradient value means bigger adjustments to the parameters, and vice versa. This process continues until a satisfying level of accuracy is reached.
RNNs leverage the backpropagation through time (BPTT) algorithm where calculations depend on previous steps. However, if the value of gradient is too small in a step during backpropagation, the value would be even smaller in the next step. This causes gradients to decrease exponentially to a point where the model stops learning.
This is called the vanishing gradient problem and causes RNNs to have a short-term memory: earlier outputs have increasingly small or no effect on the current output. This can be seen in the “What time is it?” problem above where colors for earlier words shrink as the model moves through the sentence.
The vanishing gradient problem can be remedied by different RNN variants. Two of them are called Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). These algorithms use mechanisms called “gates” to control how much and which information to retain and forget.
If you still have questions about recurrent neural networks, machine learning, or artificial intelligence, we would like to help:
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
To stay up-to-date on B2B tech & accelerate your enterprise:Follow on
Next to Read
Your email address will not be published. All fields are required.