Kubnal Bridge

Techniques & Methods

Data Augmentation

Data augmentation artificially expands training data by applying transformations: flipping and cropping images, adding noise to audio, paraphrasing sentences, or back-translating text. It improves model robustness and reduces overfitting, especially when original data is scarce.

In NLP, augmentation techniques include synonym replacement, random insertion/deletion, back-translation, and using LLMs to generate paraphrases. For LLM pre-training, data augmentation is less common given the abundance of internet text, but it is critical in specialized low-resource domains.

Authority Links

Related Terms