Model Components
Transformers
Transformers are the dominant neural network architecture in modern AI, introduced in the 2017 "Attention Is All You Need" paper. They replaced recurrent networks for sequence tasks by using self-attention to process all tokens in parallel, enabling efficient training on long sequences.
Every major LLM—GPT, Claude, Gemini, Llama—is a transformer. The architecture has also been successfully applied beyond NLP to vision (Vision Transformers), audio, protein structure (AlphaFold), and multimodal tasks.
Authority Links
Related Terms
Model Components
Transformer
A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.
Techniques & Methods
Self-Attention
Mechanism allowing a model to weigh the importance of each part of an input relative to all other parts.
Model Components
Large Language Model (LLM)
A transformer-based neural network with billions to trillions of parameters, trained on broad text corpora to predict the next token and able to generate, summarize, classify, and reason over natural language.
Techniques & Methods
Attention
Core mechanism in transformers that dynamically weights the importance of different input positions.

