Techniques & Methods
Data Augmentation
Data augmentation artificially expands training data by applying transformations: flipping and cropping images, adding noise to audio, paraphrasing sentences, or back-translating text. It improves model robustness and reduces overfitting, especially when original data is scarce.
In NLP, augmentation techniques include synonym replacement, random insertion/deletion, back-translation, and using LLMs to generate paraphrases. For LLM pre-training, data augmentation is less common given the abundance of internet text, but it is critical in specialized low-resource domains.
Authority Links
Related Terms
Miscellaneous
Training Data
The labeled or unlabeled dataset used to fit a model's parameters during the learning process.
Core Concepts
Overfitting
Model learns detail and noise in training data too thoroughly, reducing generalization.
Techniques & Methods
Training
Teaching a model to make accurate predictions by exposing it to large datasets.
Miscellaneous
Dataset
An organized collection of data examples prepared for training, evaluating, or testing AI models.

