8. Long Short-Term Memory: LSTM

LSTM (Long Short-Term Memory), invented in 1997 by S. Hochreiter and J. Schmidhuber, is a type of recurrent neural network (RNN) specifically designed to handle long-term dependencies in sequences.

LSTM networks work by using a special type of weights and biases that can store information for long periods of time. These weights and biases are controlled by three gates: the input gate, forget gate, and output gate.

The input gate controls how much new information is added to the weights and biases.
The forget gate controls how much information is removed from the weights and biases.
The output gate controls how much information from the weights and biases is used by the LSTM network.

There is a lot of good documentation available for a detailed explanation of LSTM networks, such as:

References

While many resources cover LSTM architecture, I have not found a clear and concise explanation of LSTM backpropagation. Therefore, the following sections will delve into the backpropagation process and LSTM implementation in detail.

Chapter Contents

8.1. Formulation of LSTM
8.2. Computing the gradients for Back Propagation Through Time (BPTT)
8.3. Implementation
8.4. TensorFlow and Keras versions
8.5. BPTT in Many-to-Many type

8. Long Short-Term Memory: LSTM

Fig.8-1: LSTM Architecture