Techniques & Methods
Masked Language Modeling
Masked language modeling (MLM), introduced in BERT, randomly replaces tokens in an input with a [MASK] token and trains the model to predict the original tokens. This bidirectional approach forces the model to use context from both sides of the masked position.
MLM produces powerful contextual representations useful for classification, NER, and question answering. It differs from causal/autoregressive language modeling (used in GPT) which only sees left context, making MLM models like BERT better at understanding and GPT models better at generation.
Authority Links
Related Terms
Techniques & Methods
Pre-training
Initial phase where a model learns general representations from large datasets before task-specific fine-tuning.
Model Components
Transformer
A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.
Model Components
Language Model
AI system that assigns probabilities to sequences of words and can generate coherent text.
Techniques & Methods
Training
Teaching a model to make accurate predictions by exposing it to large datasets.

