Techniques & Methods
Upstream Sampling
Upstream sampling (also called best-of-N sampling) involves generating multiple independent model completions for the same prompt, then selecting the highest-scoring output according to a reward model or evaluation function. This trades compute for quality.
It is used in RLHF pipelines and inference-time scaling strategies. Rather than improving the model's weights, upstream sampling improves output quality at inference time by exploring the model's output distribution.
Authority Links
Related Terms
Techniques & Methods
Reinforcement Learning from Human Feedback (RLHF)
Training technique that refines AI models using feedback from human evaluators on output quality.
Techniques & Methods
Generation
Producing new text, code, or content based on learned patterns and a given input prompt.
Techniques & Methods
Response Quality
Evaluation of an AI response's relevance, coherence, accuracy, and helpfulness.
Model Components
Reward Models
Models trained to score AI outputs based on human preferences for use in reinforcement learning.

