Kubnal Bridge

Techniques & Methods

Overuse Penalty

Overuse penalties (also called repetition penalties) reduce the probability of tokens that have already appeared frequently in the generated output, preventing the model from looping or producing monotonous text. This is applied during decoding at inference time.

Repetition penalties are a standard parameter in LLM inference APIs. Setting them too high reduces repetition but can cause incoherence; setting them too low allows degenerate repetitive outputs.

Authority Links

Related Terms