Techniques & Methods

Upstream Sampling

Upstream sampling (also called best-of-N sampling) involves generating multiple independent model completions for the same prompt, then selecting the highest-scoring output according to a reward model or evaluation function. This trades compute for quality.

It is used in RLHF pipelines and inference-time scaling strategies. Rather than improving the model's weights, upstream sampling improves output quality at inference time by exploring the model's output distribution.

Authority Links

Sampling in ML — Wikipedia

Statistical foundations of sampling in machine learning.

Best-of-N Sampling — arXiv

Research on best-of-N sampling as an inference-time scaling strategy.

Related Terms

Techniques & Methods

Reinforcement Learning from Human Feedback (RLHF)

Training technique that refines AI models using feedback from human evaluators on output quality.

Techniques & Methods

Generation

Producing new text, code, or content based on learned patterns and a given input prompt.

Techniques & Methods

Response Quality

Evaluation of an AI response's relevance, coherence, accuracy, and helpfulness.

Model Components

Reward Models

Models trained to score AI outputs based on human preferences for use in reinforcement learning.

Validation Transfer Learning