Techniques & Methods
Overuse Penalty
Overuse penalties (also called repetition penalties) reduce the probability of tokens that have already appeared frequently in the generated output, preventing the model from looping or producing monotonous text. This is applied during decoding at inference time.
Repetition penalties are a standard parameter in LLM inference APIs. Setting them too high reduces repetition but can cause incoherence; setting them too low allows degenerate repetitive outputs.
Authority Links
Related Terms
Techniques & Methods
Sequence Generation
Process where models produce sequences—such as words or tokens—based on learned patterns.
Techniques & Methods
Decoding Rules
Guidelines and algorithms that control how language models translate internal representations into output tokens.
Techniques & Methods
Beam Search
Search algorithm that maintains multiple candidate sequences to find high-quality generated outputs.
Techniques & Methods
Generation
Producing new text, code, or content based on learned patterns and a given input prompt.

