Kubnal Bridge

Techniques & Methods

Masked Language Modeling

Masked language modeling (MLM), introduced in BERT, randomly replaces tokens in an input with a [MASK] token and trains the model to predict the original tokens. This bidirectional approach forces the model to use context from both sides of the masked position.

MLM produces powerful contextual representations useful for classification, NER, and question answering. It differs from causal/autoregressive language modeling (used in GPT) which only sees left context, making MLM models like BERT better at understanding and GPT models better at generation.

Authority Links

Related Terms