Techniques & Methods

Attention Mechanism

The attention mechanism computes a weighted sum of input representations, where weights reflect how relevant each input position is to producing the current output. Originally introduced to improve machine translation (allowing the decoder to focus on relevant source words), it was later generalized to self-attention in transformers.

Attention enables transformers to capture long-range dependencies that RNNs struggled with, making them dramatically more effective for long documents, complex reasoning, and multilingual tasks.

Authority Links

Attention Mechanism — Wikipedia

Origins and mathematical formulation of the attention mechanism.

Attention Is All You Need — arXiv

The paper that generalized attention into the transformer architecture.

Related Terms

Techniques & Methods

Self-Attention

Mechanism allowing a model to weigh the importance of each part of an input relative to all other parts.

Model Components

Transformer

A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.

Techniques & Methods

Attention

Core mechanism in transformers that dynamically weights the importance of different input positions.

Model Components

Sequence-to-Sequence (Seq2Seq) Models

Models that transform input sequences into output sequences, used in translation and summarization.

Autoregression Attention