Techniques & Methods

Low Rank Adaptation (LoRA)

LoRA freezes the original pre-trained model weights and injects trainable rank-decomposition matrices into each layer of the transformer. Instead of updating all billions of parameters, only the small adapter matrices are trained, reducing trainable parameters by 10,000x while maintaining quality.

LoRA democratized LLM fine-tuning by making it feasible on consumer GPUs. Variants like QLoRA (quantized LoRA) further reduce memory requirements. LoRA adapters can be swapped without reloading the base model, enabling efficient multi-task deployment.

Authority Links

LoRA Paper — arXiv

Original LoRA paper on low-rank adaptation for LLM fine-tuning.

Hugging Face PEFT

Library for LoRA and other parameter-efficient fine-tuning methods.

Related Terms

Techniques & Methods

Fine-Tuning

Continuing the training of a pre-trained foundation model on a smaller, curated dataset to specialize its behavior, style, or domain expertise without losing its general capabilities.

Techniques & Methods

Supervised Fine-Tuning

Refining a pre-trained model's performance on a specific task using labeled example data.

Model Components

Parameter

A learnable variable within a model whose value is adjusted during training to minimize prediction error.

Model Components

Foundational Model

Large versatile model trained on broad data that serves as a base for diverse downstream applications.

Machine Translation Linguistic Annotation