1. Hironobu SUZUKI @ InterDB >
  2. Part 4: Transformer >
  3. 17. Dig into the components

17. Dig into the components

In this chapter, we will dig into the core components of the Transformer model. Especially, we will focus on analyzing the limitations of the original Transformer and explore key improvements introduced by subsequent research.

Reference
  • A survey of transformers (20.Oct.2022)
Part Contents

17.1. Multi-Head Attention
17.2. Positional Encoding
17.3. Position-wise Feed-Forward Networks (FFN)

The Engineer's Guide
  To  Deep Learning
  • Home
  • Part 1: Neural Networks
    • 1. Perceptron
    • 2. Neural Network
      • 2.1. Formulation of NNs
      • 2.2. Overview of NN Training
      • 2.3. Back Propagation
      • 2.4. Implementation
    • 3. Activation Functions
    • 4. Neural Network ... revisited
      • 4.1. Dense Layer
      • 4.2. Softmax Function
      • 4.3. Optimization
      • 4.4. Exploding and Vanishing Gradients Problems
    • 5. Tensorflow, PyTorch and Keras
  • Part 2: RNNs
    • 6. Dataset and Task
    • 7. Simple RNN
      • 7.1. Formulation
      • 7.2. Back Propagation Through Time
      • 7.3. Implementation
      • 7.4. TensorFlow, PyTorch and Keras
      • 7.5. BPTT in Many-to-Many type
      • 7.6. Exploding and Vanishing Gradients Problems
    • 8. Long Short-Term Memory: LSTM
      • 8.1. Formulation
      • 8.2. Back Propagation Through Time
      • 8.3. Implementation
      • 8.4. TensorFlow, PyTorch and Keras
      • 8.5. BPTT in Many-to-Many type
    • 9. Gated Recurrent Unit: GRU
      • 9.1. Formulation
      • 9.2. Back Propagation Through Time
      • 9.3. Implementation
      • 9.4. TensorFlow, PyTorch and Keras
      • 9.5. BPTT in Many-to-Many type
  • Part 3: NLP and Attentions
    • 10. Dataset and Tokenizer
      • 10.1. Datasets
      • 10.2. Helper Modules
      • 10.3. Word Embedding
    • 11. Language Models
      • 11.1. N-Gram Model
      • 11.2. RNN Based Language Model Architectures
      • 11.3. Language Modeling with RNN
    • 12. Sequence Classification
    • 13. Machine Translation
      • 13.1. Training and Translation
      • 13.2. Implementation
    • 14. Attention Mechanism
      • 14.1. Definition
      • 14.2. Implementations
      • 14.3. Sentiment Analysis
      • 14.4. Encoder-Decoder with Attention
  • Part 4: Transformer
    • 15. Overview
      • 15.1. Positional Encoding
      • 15.2. Multi-Head Attention
      • 15.3. Position-Wise FFN
    • 16. Implementation
      • 16.1. Create Model
      • 16.2. Training
      • 16.3. Translation
    • 17. Dig into the components
      • 17.1. Multi-Head Attention
      • 17.2. Positional Encoding
      • 17.3. Position-wise FFN
    • 18. Related Topics
  • Appendix: Basic Knowledge
    • 1. Python
    • 2. Mathematics
      • 2.1. Linear Algebra
      • 2.2. Differential Calculus
      • 2.3. Gradient Descent Algorithms
      • 2.4. Probability
More
  • Personal Site
  • The Internals of PostgreSQL
  • GitHub repo

  •  
  •  
  •  

Built by Hugo


©Copyright 2024 Hironobu SUZUKI All Rights Reserved.