7.1. Formulation of Simple RNN
The formulation of the simple RNN is defined as follows:
$$ \begin{cases} \hat{h}^{(t)} = W x^{(t)} + U h^{(t-1)} + b \\ h^{(t)} = f(\hat{h}^{(t)}) \end{cases} \tag{7.1} $$Given that the number of input nodes and hidden nodes are $ m $ and $ h $, respectively, then:
- $ x^{(t)} \in \mathbb{R}^{m} $ is the input at time $t$.
- $ h^{(t)} \in \mathbb{R}^{h} $ is the hidden state at time $t$.
- $ W \in \mathbb{R}^{h \times m}$ and $U \in \mathbb{R}^{h \times h}$ are the weights matrices.
- $ b \in \mathbb{R}^{m}$ is the bias term.
- $ f(\cdot) $ is the activation function.
Fig.7-3 illustrates the simple RNN defined above:
Since the SimpleRNN lacks an output layer, we add a dense layer (also known as a fully connected layer) to extract the final hidden state. This makes it a many-to-one RNN. See Fig.7-4.
Given that the number of output nodes is $ n $. Then, the dense layer is defined as follows:
$$ \begin{cases} \hat{y}^{(T)} = V h^{(T)} + c \\ y^{(T)} = g(\hat{y}^{(T)}) \end{cases} \tag{7.2} $$where:
- $ V \in \mathbb{R}^{n \times h} $ is the weight matrix.
- $ c \in \mathbb{R}^{n} $ is the bias term.
- $ y^{(T)} \in \mathbb{R}^{n} $ is the output vector.
- $ g(\cdot) $ is the activation function.