Model Components
Encoder
The encoder reads the full input sequence and produces a contextualized representation for each token using bidirectional self-attention—each token can attend to all other tokens in both directions. BERT is the canonical encoder-only model, excelling at understanding tasks like classification and NER.
In encoder-decoder models (T5, BART), the encoder processes the input, and the decoder generates the output conditioned on the encoder's representations. Encoder-only models are faster for inference on understanding tasks.
Authority Links
Related Terms
Model Components
Transformer Decoder
Transformer component that generates output sequences by attending to encoded inputs and prior outputs.
Model Components
Transformer
A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.
Model Components
Sequence-to-Sequence (Seq2Seq) Models
Models that transform input sequences into output sequences, used in translation and summarization.
Techniques & Methods
Masked Language Modeling
Training technique where the model predicts randomly hidden words in a sequence.

