Kubnal Bridge

Techniques & Methods

Low Rank Adaptation (LoRA)

LoRA freezes the original pre-trained model weights and injects trainable rank-decomposition matrices into each layer of the transformer. Instead of updating all billions of parameters, only the small adapter matrices are trained, reducing trainable parameters by 10,000x while maintaining quality.

LoRA democratized LLM fine-tuning by making it feasible on consumer GPUs. Variants like QLoRA (quantized LoRA) further reduce memory requirements. LoRA adapters can be swapped without reloading the base model, enabling efficient multi-task deployment.

Authority Links

Related Terms