Model Components

Encoder

The encoder reads the full input sequence and produces a contextualized representation for each token using bidirectional self-attention—each token can attend to all other tokens in both directions. BERT is the canonical encoder-only model, excelling at understanding tasks like classification and NER.

In encoder-decoder models (T5, BART), the encoder processes the input, and the decoder generates the output conditioned on the encoder's representations. Encoder-only models are faster for inference on understanding tasks.

Authority Links

Transformer Encoder — Wikipedia

How the encoder component processes input in transformers.

BERT Paper — arXiv

BERT: the landmark encoder-only transformer for NLP understanding.

Related Terms

Model Components

Transformer Decoder

Transformer component that generates output sequences by attending to encoded inputs and prior outputs.

Model Components

Transformer

A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.

Model Components

Sequence-to-Sequence (Seq2Seq) Models

Models that transform input sequences into output sequences, used in translation and summarization.

Techniques & Methods

Masked Language Modeling

Training technique where the model predicts randomly hidden words in a sequence.

Foundational Model Embeddings