Kubnal Bridge

Techniques & Methods

Attention

Attention in transformers computes query, key, and value projections for each token, then uses dot-product similarity between queries and keys to weight the value vectors. Multi-head attention runs this process in parallel with different learned projections, capturing diverse relationship types simultaneously.

Attention is both the key innovation of modern AI and its primary scaling bottleneck: standard attention is O(n²) in sequence length, making long contexts expensive. Efficient attention variants (Flash Attention, Sparse Attention) address this computational challenge.

Authority Links

Related Terms