Kubnal Bridge

Techniques & Methods

Adversarial Training

Adversarial training adds deliberately challenging or misleading examples to the training set to improve model robustness. In computer vision, adversarial examples are images with imperceptible perturbations that fool classifiers; in NLP, they include paraphrases, grammatical variations, or intentionally misleading queries.

In GANs, adversarial training describes the competition between generator and discriminator networks. In safety-focused LLM development, adversarial training (red-teaming) involves humans crafting inputs to elicit harmful outputs, which are then used to improve refusal behavior.

Authority Links

Related Terms