Kubnal Bridge

Miscellaneous

Test Data

Test data is isolated from all model development decisions and used only for final evaluation after training and validation are complete. This ensures performance estimates are not inflated by overfitting to the evaluation set.

Data contamination—where test data inadvertently appears in training data—is a significant challenge for LLM evaluation, as models trained on vast internet text may have seen benchmark examples. Careful dataset curation and held-out benchmark creation address this.

Authority Links

Related Terms