Miscellaneous
Test Data
Test data is isolated from all model development decisions and used only for final evaluation after training and validation are complete. This ensures performance estimates are not inflated by overfitting to the evaluation set.
Data contamination—where test data inadvertently appears in training data—is a significant challenge for LLM evaluation, as models trained on vast internet text may have seen benchmark examples. Careful dataset curation and held-out benchmark creation address this.
Authority Links
Related Terms
Miscellaneous
Validation Data
A held-out data split used during training to tune hyperparameters and monitor generalization.
Miscellaneous
Training Data
The labeled or unlabeled dataset used to fit a model's parameters during the learning process.
Techniques & Methods
Evaluation Metrics
Quantitative measures used to assess how well an AI model performs on a task.
Miscellaneous
Dataset
An organized collection of data examples prepared for training, evaluating, or testing AI models.

