Core Concepts
Big Data
Big data refers to datasets too large and complex for traditional data-processing tools, characterized by the three Vs: Volume (scale), Velocity (speed of generation), and Variety (structured and unstructured types). Technologies like Hadoop, Spark, and cloud data warehouses were built to handle it.
Big data is the fuel for modern AI. The scale of training data—trillions of tokens for frontier LLMs—is a primary driver of model capability, making data collection, curation, and governance central to AI development.
Authority Links
Related Terms
Core Concepts
Machine Learning
Getting computers to learn from data and improve at tasks without explicit programming.
Miscellaneous
Training Data
The labeled or unlabeled dataset used to fit a model's parameters during the learning process.
Miscellaneous
Data Science
Interdisciplinary field combining statistics, programming, and domain knowledge to extract insights from data.
Miscellaneous
Dataset
An organized collection of data examples prepared for training, evaluating, or testing AI models.

