Model Components

Transformers

Transformers are the dominant neural network architecture in modern AI, introduced in the 2017 "Attention Is All You Need" paper. They replaced recurrent networks for sequence tasks by using self-attention to process all tokens in parallel, enabling efficient training on long sequences.

Every major LLM—GPT, Claude, Gemini, Llama—is a transformer. The architecture has also been successfully applied beyond NLP to vision (Vision Transformers), audio, protein structure (AlphaFold), and multimodal tasks.

Authority Links

Transformer — Wikipedia

Architecture, variants, and applications of transformer models.

Attention Is All You Need — arXiv

Original paper introducing the transformer architecture.

Related Terms

Model Components

Transformer

A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.

Techniques & Methods

Self-Attention

Mechanism allowing a model to weigh the importance of each part of an input relative to all other parts.

Model Components

Large Language Model (LLM)

A transformer-based neural network with billions to trillions of parameters, trained on broad text corpora to predict the next token and able to generate, summarize, classify, and reason over natural language.

Techniques & Methods

Attention

Core mechanism in transformers that dynamically weights the importance of different input positions.

Adversarial Training Transformer Decoder