Techniques & Methods

Scaling Laws

Scaling laws in AI describe power-law relationships between model performance and scale—specifically the number of parameters, training data, and compute. Research by OpenAI (Kaplan et al.) showed that loss improves predictably as these variables increase, enabling forecasting of model capabilities before training.

Scaling laws have driven the race to build ever-larger models. They also reveal that optimal training involves balancing model size and data size for a given compute budget, leading to "Chinchilla optimal" training regimes.

Authority Links

Scaling Laws Paper — arXiv

Original OpenAI paper on neural language model scaling laws.

Chinchilla Paper — arXiv

Optimal compute allocation for training large language models.

Related Terms

Model Components

Large Language Model (LLM)

A transformer-based neural network with billions to trillions of parameters, trained on broad text corpora to predict the next token and able to generate, summarize, classify, and reason over natural language.

Techniques & Methods

Pre-training

Initial phase where a model learns general representations from large datasets before task-specific fine-tuning.

Techniques & Methods

Training

Teaching a model to make accurate predictions by exposing it to large datasets.

Model Components

Foundational Model

Large versatile model trained on broad data that serves as a base for diverse downstream applications.

Self-Attention Retrieval Augmented Generation (RAG)