18. Related Topics
![](./4111635285_ce0c7d7c52_c.jpg)
This chapter delves into the evolution of the Transformer model, exploring its potential applications beyond natural language processing.
18.1. Large Language Models (LLMs)
18.2. Computer Vision
18.3. Time-series Analysis
18.4. Reinforcement Learning
18.5. Transformer Alternatives
18.1. Large Language Models (LLMs)
LLMs, a type of AI trained on massive amounts of text data, learn the patterns of human language and generate human-quality text. They have already caused a significant impact on society.
According to the paper “Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond”, three main streams of LLM technology trends exist:
- Encoder-Only
- Encoder-Decoder
- Decoder-Only
See the following figure cited from the paper:
Nearly all of these trends are rooted in the Transformer model1. For example, BERT leverages only the encoder component of the Transformer model, while GPT2 solely employs the decoder component.
Currently, the most powerful models are commercially developed. However, numerous open-source models have also been released. Additionally, several open-source models are available that can be run on personal computers or other consumer hardware.
-
Large Language Models: A Survey (v1: 9.Feb.2024, v2: 20.Feb.2024)
-
Large Language Models for Software Engineering: Survey and Open Problems (11.Nov.2023)
-
A Survey of Large Language Models (v1:31.Mar.2023, v13:24.Nov.2023)
-
A Survey on Evaluation of Large Language Model (17.Oct.2023)
-
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond (27.Apr.2023)
-
TRANSFORMER MODELS: AN INTRODUCTION AND CATALOG (26.Feb.2023)
18.1.1. MultiModal Large Language Models
Research publications on multimodal LLMs have seen a significant surge since 2023.
The following figure is cited from the survey:
- MM-LLMs: Recent Advances in MultiModal Large Language Models (v1:24.Jan.2024, v4:20.Feb.2024)
18.1.2. Retrieval-Augmented Generation (RAG) for LLMs
RAG is a technique used to enhance the quality and reliability of text generated by LLMs by incorporating factual information retrieved from external knowledge sources.
The following figure is cited from the survey:
18.1.3. LLM for Site Reliability Engineering (SRE)
Studies on AIOps, Artificial Intelligence for IT Operations, have been conducted since the relatively early days in the IT operation field3. These studies have focused on areas such as failure management and resource provisioning, mainly using traditional machine learning methods like anomaly detection and root cause analysis.
Since 2023, the AIOps field has seen the introduction of LLM techniques.
While research with LLMs in this area is still in its early stages, I believe AI has the power to transform the field of Site Reliability Engineering (SRE) from a discipline heavily reliant on individual skills to one that incorporates more engineering and scientific principles.
-
Commentary Article:
-
AIOps
-
LLM for SRE/DBA
- RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models (25.Oct.2023)
- D-Bot: Database Diagnosis System using Large Language Models (v1: 3.Dec.2023, v2: 6.Dec.2023)
- Panda: Performance Debugging for Databases using LLM Agents
- LLM As DBA (v1:10.Aug.2023, v2:11.Aug.2023)
18.2. Computer Vision
Similar to the shift from RNNs to Transformers in NLP, computer vision has seen a transition from convolutional neural network (CNN) architectures to transformer-based architectures with the introduction of Vision Transformer (ViT) in 2020.
18.3. Time-series Analysis
Since sentences in natural language can be viewed as a type of time series data, it is natural to consider applying the Transformer model, originally introduced for NLP, to traditional time series analysis domains.
Additionally, a paper exploring time series analysis using Large Language Models (LLMs) was published in 2024.
-
Transformer
- Transformers in Time Series: A Survey (11.May.2023)
- A Survey on Time-Series Pre-Trained Models (18.May.2023)
-
LLM
IMO: While Transformer and LLM analyses show promise, their lack of interpretability is a concern. Traditional time series analysis methods, on the other hand, provide valuable insights based on a strong mathematical foundation. In my opinion, we shouldn’t abandon traditional methods just yet.
18.4. Reinforcement Learning
The “third golden age” of AI arrived with breakthroughs like Deep Q-Networks (DQNs) conquering Atari games and AlphaGo defeating the Go world champion. These achievements popularized the term “Deep Learning.”
While these achievements are rooted in the field of reinforcement learning (RL), the impact of Transformers is starting to influence RL research as well.
Here are two notable examples:
- Decision Transformer (2021)
This model tackles offline RL4, where the agent cannot interact with the environment during training. It leverages Transformer architecture to analyze past experiences and make optimal decisions in unseen situations.
- MAT: Multi-Agent Transformer (2022)
MAT addresses the complexities of situations where multiple agents interact and compete or cooperate. This model utilizes Transformer mechanisms to capture the complex dynamics and dependencies between agents.
-
Paper
-
Survey
- Transformers in Reinforcement Learning: A Survey (12.Jul.2023)
- Transformer in Reinforcement Learning for Decision-Making: A Survey (30.Oct.2023)
18.5. Transformer Alternatives
Many alternative models to the Transformer are continuously being proposed.
Here are some recent examples:
- Synthesizer: Rethinking Self-Attention in Transformer Models (v1: 2.May.2020, v3: 24.May.2021)
- FNet: Mixing Tokens with Fourier Transforms (v1: 9.May.2021, v4: 26.May.2022)
- Are Pre-trained Convolutions Better than Pre-trained Transformers? (v1: 7.May.2021, v2: 30.Jan.2022)
- Efficiently Modeling Long Sequences with Structured State Spaces (v1: 31.Oct.2021, v3: 5.Aug.2022)
- RWKV: Reinventing RNNs for the Transformer Era (v1: 22.May.2023, v2: 11.Dec.2023)
- (RetNet) Retentive Network: A Successor to Transformer for Large Language Models (v1: 17.Jul.2023, v4: 9.Aug.2023)
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (1.Dec.2023)
- KAN: Kolmogorov-Arnold Networks (v1: 30.Apr.2024, v3: 24.May.2024)
- xLSTM: Extended Long Short-Term Memory (7 May 2024)
While these alternatives are not yet mainstream, it does not imply their inferiority to the Transformer.
Currently, research resources primarily focus on the Transformer and its variants. However, if one of these alternatives or a future model proves significantly more effective than the Transformer, it could potentially become the dominant architecture.
-
ELMo is LSTM-based model. In commercial models, it is impossible to say definitively whether they are transformer-based, as some do not disclose details about their internal architecture. ↩︎
-
It’s fascinating to me that GPT-2, a decoder built from Transformer modules, presents only one formula in its paper:
This is a familiar representation of language models for us.
It reveals that N-gram, RNN-based language models, and LLMs all share the foundation of statistical language models, even though LLMs appear like technology from another world or magic, much like how DNA forms the basis for all life, from bacteria to human being. ↩︎ -
As a database engineer, I initially became interested in Deep Learning (and Reinforcement Learning) to explore the potential of applying LLMs to system administration tasks. ↩︎
-
Online reinforcement learning agents interact with an environment in real-time. For example, in online learning, an AI agent plays the game of Go in real-time, adjusting its strategy based on the outcomes of each move during the game.
In contrast, offline reinforcement learning uses a fixed dataset of pre-collected experiences to train the agent, such as learning from a historical dataset of Go games. This is useful for complex or costly environments where direct interaction is impractical, like optimizing industrial processes based on historical data. ↩︎