1. Hironobu SUZUKI @ InterDB >
  2. Part 1: Neural Networks >
  3. 4. Neural Network ... revisited

4. Neural Network ... revisited

This chapter delves into advanced neural network concepts that are required to understand the Transformer model.

Chapter Contents

4.1. Dense Layer
4.2. Softmax Activation Function
4.3. Optimization
4.4. Exploding and Vanishing Gradients Problems

The Engineer's Guide
  To  Deep Learning
  • Home
  • Part 1: Neural Networks
    • 1. Perceptron
    • 2. Neural Network
      • 2.1. Formulation of NNs
      • 2.2. Overview of NN Training
      • 2.3. Back Propagation
      • 2.4. Implementation
    • 3. Activation Functions
    • 4. Neural Network ... revisited
      • 4.1. Dense Layer
      • 4.2. Softmax Function
      • 4.3. Optimization
      • 4.4. Exploding and Vanishing Gradients Problems
    • 5. Tensorflow, PyTorch and Keras
  • Part 2: RNNs
    • 6. Dataset and Task
    • 7. Simple RNN
      • 7.1. Formulation
      • 7.2. Back Propagation Through Time
      • 7.3. Implementation
      • 7.4. TensorFlow, PyTorch and Keras
      • 7.5. BPTT in Many-to-Many type
      • 7.6. Exploding and Vanishing Gradients Problems
    • 8. Long Short-Term Memory: LSTM
      • 8.1. Formulation
      • 8.2. Back Propagation Through Time
      • 8.3. Implementation
      • 8.4. TensorFlow, PyTorch and Keras
      • 8.5. BPTT in Many-to-Many type
    • 9. Gated Recurrent Unit: GRU
      • 9.1. Formulation
      • 9.2. Back Propagation Through Time
      • 9.3. Implementation
      • 9.4. TensorFlow, PyTorch and Keras
      • 9.5. BPTT in Many-to-Many type
  • Part 3: NLP and Attentions
    • 10. Dataset and Tokenizer
      • 10.1. Datasets
      • 10.2. Helper Modules
      • 10.3. Word Embedding
    • 11. Language Models
      • 11.1. N-Gram Model
      • 11.2. RNN Based Language Model Architectures
      • 11.3. Language Modeling with RNN
    • 12. Sequence Classification
    • 13. Machine Translation
      • 13.1. Training and Translation
      • 13.2. Implementation
    • 14. Attention Mechanism
      • 14.1. Definition
      • 14.2. Implementations
      • 14.3. Sentiment Analysis
      • 14.4. Encoder-Decoder with Attention
  • Part 4: Transformer
    • 15. Overview
      • 15.1. Positional Encoding
      • 15.2. Multi-Head Attention
      • 15.3. Position-Wise FFN
    • 16. Implementation
      • 16.1. Create Model
      • 16.2. Training
      • 16.3. Translation
    • 17. Dig into the components
      • 17.1. Multi-Head Attention
      • 17.2. Positional Encoding
      • 17.3. Position-wise FFN
    • 18. Related Topics
  • Appendix: Basic Knowledge
    • 1. Python
    • 2. Mathematics
      • 2.1. Linear Algebra
      • 2.2. Differential Calculus
      • 2.3. Gradient Descent Algorithms
      • 2.4. Probability
More
  • Personal Site
  • The Internals of PostgreSQL
  • GitHub repo

  •  
  •  
  •  

Built by Hugo


©Copyright 2024 Hironobu SUZUKI All Rights Reserved.