Techniques & Methods
Reinforcement Learning
Reinforcement learning (RL) trains agents to make sequences of decisions by maximizing cumulative reward signals from the environment. Unlike supervised learning, there are no labeled examples—the agent must explore and discover which actions lead to positive outcomes.
RL underpins game-playing AI (AlphaGo, Atari), robotic control, and the RLHF pipelines used to align language models. Its core challenge is the exploration-exploitation trade-off: balancing trying new actions against exploiting known good ones.
Authority Links
Related Terms
Techniques & Methods
Reinforcement Learning from Human Feedback (RLHF)
Training technique that refines AI models using feedback from human evaluators on output quality.
Techniques & Methods
Proximal Policy Optimization (PPO)
RL algorithm that balances exploration and exploitation by constraining policy update size.
Techniques & Methods
Markov Decision Process
Mathematical framework modeling sequential decision-making in environments with probabilistic outcomes.
Model Components
Reward Models
Models trained to score AI outputs based on human preferences for use in reinforcement learning.

