Techniques & Methods
Attention Mechanism
The attention mechanism computes a weighted sum of input representations, where weights reflect how relevant each input position is to producing the current output. Originally introduced to improve machine translation (allowing the decoder to focus on relevant source words), it was later generalized to self-attention in transformers.
Attention enables transformers to capture long-range dependencies that RNNs struggled with, making them dramatically more effective for long documents, complex reasoning, and multilingual tasks.
Authority Links
Related Terms
Techniques & Methods
Self-Attention
Mechanism allowing a model to weigh the importance of each part of an input relative to all other parts.
Model Components
Transformer
A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.
Techniques & Methods
Attention
Core mechanism in transformers that dynamically weights the importance of different input positions.
Model Components
Sequence-to-Sequence (Seq2Seq) Models
Models that transform input sequences into output sequences, used in translation and summarization.

