Techniques & Methods

Prompt Injection

Prompt injection exploits the fact that LLMs cannot reliably distinguish between instructions from trusted sources (system prompts) and untrusted inputs (user data or web content). An attacker embeds instructions in retrieved content that override system-level directives.

Direct injection attacks user-controlled inputs; indirect injection hides instructions in external content the AI retrieves (web pages, documents). It is a critical security concern for AI applications that process external data.

Authority Links

Prompt Injection — Wikipedia

How prompt injection attacks work and why they are difficult to prevent.

OWASP LLM Top 10

Security risks including prompt injection in LLM applications.

Related Terms

Techniques & Methods

Prompt Engineering

The discipline of designing input text — instructions, examples, constraints, and context — to reliably steer a language model toward accurate, well-formatted, and intent-aligned outputs without modifying model weights.

Techniques & Methods

System Prompt

Internal instructions that guide an AI model's behavior, tone, and response style.

Techniques & Methods

Prompt

Text input provided to an AI model to guide the content and format of its response.

Techniques & Methods

AI Alignment

The research field and engineering practice of building AI systems that reliably pursue goals humans actually want, remain controllable, and avoid harmful side effects — operationalized through RLHF, Constitutional AI, evaluations, and interpretability.

Proximal Policy Optimization (PPO)Prompt Engineering