Applications
InstructGPT
InstructGPT, developed by OpenAI and described in a 2022 paper, demonstrated that fine-tuning GPT-3 with RLHF dramatically improved its ability to follow diverse instructions, reduce harmful outputs, and produce honest responses—even with a smaller 1.3B parameter model outperforming the 175B GPT-3 base model on user preference.
InstructGPT established the SFT + RLHF training paradigm that has become standard for aligning LLMs. It directly preceded ChatGPT and influenced alignment approaches at Anthropic (Constitutional AI) and Google DeepMind.
Authority Links
Related Terms
Techniques & Methods
Reinforcement Learning from Human Feedback (RLHF)
Training technique that refines AI models using feedback from human evaluators on output quality.
Techniques & Methods
Supervised Fine-Tuning
Refining a pre-trained model's performance on a specific task using labeled example data.
Techniques & Methods
AI Alignment
The research field and engineering practice of building AI systems that reliably pursue goals humans actually want, remain controllable, and avoid harmful side effects — operationalized through RLHF, Constitutional AI, evaluations, and interpretability.
Model Components
Generative Pre-trained Transformer (GPT)
A family of decoder-only Transformer language models — pioneered by OpenAI — that combines large-scale unsupervised pre-training on text with task-specific alignment to produce general-purpose text generation.

