15. Overview

The Transformer is an encoder-decoder architecture model, designed initially for natural language translation tasks.

Fig.15-1 illustrates its architecture:

Fig.15-1: The Transformer - model architecture.

In Fig.15-1, the left and right sides represent the encoder and decoder components, respectively.

Both the encoder and decoder consist of multi-head attention layers, position-wise feed-forward networks (FFNs), and normalization layers. Additionally, the Transformer leverages a technique called positional encoding for word embedding.


A unit consisting of multi-head attention mechanisms and a feed-forward network is often referred to as a transformer unit or a transformer block.

The following sections will delve into the structure of the Transformer as an encoder-decoder model, providing detailed explanations for each component:

  • Positional Encoding
  • Multi-Head Attention Mechanisms
  • Position-Wise Feed-Forward Network