18. Related Topics

This chapter delves into the evolution of the Transformer model, exploring its potential applications beyond natural language processing.

18.1. Large Language Models (LLMs)

LLMs, a type of AI trained on massive amounts of text data, learn the patterns of human language and generate human-quality text. They have already caused a significant impact on society.

According to the paper “Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond”, three main streams of LLM technology trends exist:

  • Encoder-Only
  • Encoder-Decoder
  • Decoder-Only

See the following figure cited from the paper:

Nearly all of these trends are rooted in the Transformer model1. For example, BERT leverages only the encoder component of the Transformer model, while GPT2 solely employs the decoder component.

Currently, the most powerful models are commercially developed. However, numerous open-source models have also been released. Additionally, several open-source models are available that can be run on personal computers or other consumer hardware.

18.1.1. MultiModal Large Language Models

Research publications on multimodal LLMs have seen a significant surge since 2023.

The following figure is cited from the survey:

Reference

18.1.2. Retrieval-Augmented Generation (RAG) for LLMs

RAG is a technique used to enhance the quality and reliability of text generated by LLMs by incorporating factual information retrieved from external knowledge sources.

The following figure is cited from the survey:

18.1.3. LLM for Site Reliability Engineering (SRE)

Studies on AIOps, Artificial Intelligence for IT Operations, have been conducted since the relatively early days in the IT operation field3. These studies have focused on areas such as failure management and resource provisioning, mainly using traditional machine learning methods like anomaly detection and root cause analysis.

Since 2023, the AIOps field has seen the introduction of LLM techniques.

While research with LLMs in this area is still in its early stages, I believe AI has the power to transform the field of Site Reliability Engineering (SRE) from a discipline heavily reliant on individual skills to one that incorporates more engineering and scientific principles.

18.2. Computer Vision

Similar to the shift from RNNs to Transformers in NLP, computer vision has seen a transition from convolutional neural network (CNN) architectures to transformer-based architectures with the introduction of Vision Transformer (ViT) in 2020.

18.3. Time-series Analysis

Since sentences in natural language can be viewed as a type of time series data, it is natural to consider applying the Transformer model, originally introduced for NLP, to traditional time series analysis domains.

Additionally, a paper exploring time series analysis using Large Language Models (LLMs) was published in 2024.

IMO: While Transformer and LLM analyses show promise, their lack of interpretability is a concern. Traditional time series analysis methods, on the other hand, provide valuable insights based on a strong mathematical foundation. In my opinion, we shouldn’t abandon traditional methods just yet.

18.4. Reinforcement Learning

The “third golden age” of AI arrived with breakthroughs like Deep Q-Networks (DQNs) conquering Atari games and AlphaGo defeating the Go world champion. These achievements popularized the term “Deep Learning.”

While these achievements are rooted in the field of reinforcement learning (RL), the impact of Transformers is starting to influence RL research as well.

Here are two notable examples:

  • Decision Transformer (2021)
    This model tackles offline RL4, where the agent cannot interact with the environment during training. It leverages Transformer architecture to analyze past experiences and make optimal decisions in unseen situations.
  • MAT: Multi-Agent Transformer (2022)
    MAT addresses the complexities of situations where multiple agents interact and compete or cooperate. This model utilizes Transformer mechanisms to capture the complex dynamics and dependencies between agents.

18.5. Transformer Alternatives

Many alternative models to the Transformer are continuously being proposed.

Here are some recent examples:

While these alternatives are not yet mainstream, it does not imply their inferiority to the Transformer.

Currently, research resources primarily focus on the Transformer and its variants. However, if one of these alternatives or a future model proves significantly more effective than the Transformer, it could potentially become the dominant architecture.


  1. ELMo is LSTM-based model. In commercial models, it is impossible to say definitively whether they are transformer-based, as some do not disclose details about their internal architecture. ↩︎

  2. It’s fascinating to me that GPT-2, a decoder built from Transformer modules, presents only one formula in its paper: This is a familiar representation of language models for us.
    It reveals that N-gram, RNN-based language models, and LLMs all share the foundation of statistical language models, even though LLMs appear like technology from another world or magic, much like how DNA forms the basis for all life, from bacteria to human being. ↩︎

  3. As a database engineer, I initially became interested in Deep Learning (and Reinforcement Learning) to explore the potential of applying LLMs to system administration tasks. ↩︎

  4. Online reinforcement learning agents interact with an environment in real-time. For example, in online learning, an AI agent plays the game of Go in real-time, adjusting its strategy based on the outcomes of each move during the game.
    In contrast, offline reinforcement learning uses a fixed dataset of pre-collected experiences to train the agent, such as learning from a historical dataset of Go games. This is useful for complex or costly environments where direct interaction is impractical, like optimizing industrial processes based on historical data. ↩︎