AI Glossary
Key terms and concepts behind AI search, generative engine optimisation, and LLM visibility—explained plainly.
Core Concepts
Zone of Proximal Development (ZPD)
Tasks an AI can perform with guidance but not independently.
Core Concepts
Weak AI
AI designed and trained for a specific task, lacking general cognitive abilities.
Core Concepts
Variance
Amount by which model predictions vary from average, reflecting sensitivity to training data.
Core Concepts
Unsupervised Learning
Models learn patterns from unlabeled data without explicit instructions.
Core Concepts
Turing Test
Test of a machine's ability to exhibit intelligent behavior indistinguishable from a human.
Core Concepts
Token
Smallest processing unit in NLP: a word, word part, or character.
Core Concepts
Supervised Learning
Models trained on labeled data, learning to predict outcomes from inputs.
Core Concepts
Strong AI
AI with the ability to understand, learn, and apply knowledge like human intelligence.
Core Concepts
Overfitting
Model learns detail and noise in training data too thoroughly, reducing generalization.
Core Concepts
Natural Language Understanding (NLU)
AI's ability to understand and interpret human language meaning and intent.
Core Concepts
Natural Language Processing (NLP)
Field focused on enabling computer-human interaction through natural language.
Core Concepts
Natural Language Generation (NLG)
Generating coherent, contextually relevant text from structured data or prompts.
Core Concepts
Pattern Recognition
Automated recognition of patterns and regularities in data.
Core Concepts
Latent Variables
Hidden or unobservable variables inferred from observable data in AI models.
Core Concepts
Intent
Underlying purpose or goal a user aims to achieve through a query.
Core Concepts
Hyperparameter
Parameter set before learning begins that controls the training process.
Core Concepts
Explainable AI (XAI)
AI systems that provide transparent insights into their decision-making processes.
Core Concepts
Entities
Specific, identifiable elements like names, places, and dates extracted from text.
Core Concepts
Deep Learning
Subset of ML using neural networks with many layers to analyze complex data representations.
Core Concepts
Computational Learning Theory
Branch of AI focused on understanding the mathematical foundations of learning algorithms.
Core Concepts
Cognitive Computing
Systems designed to simulate human brain functioning, reasoning, and problem-solving.
Core Concepts
Bias
Preconceived notions in AI models that affect decision-making and fairness.
Core Concepts
Big Data
Extremely large datasets that reveal patterns, trends, and associations through computational analysis.
Core Concepts
Autonomous
Machines capable of performing tasks and making decisions without human intervention.
Core Concepts
Augmented Intelligence
Enhancing human decision-making with AI, focusing on human-AI collaboration rather than replacement.
Core Concepts
Algorithm
A set of mathematical instructions or rules computers follow to accomplish specific tasks.
Core Concepts
AI (Artificial Intelligence)
Simulation of human intelligence processes by machines, particularly computer systems.
Core Concepts
General AI
AI that exhibits cognitive functions across multiple domains, like human general intelligence.
Core Concepts
Machine Learning
Getting computers to learn from data and improve at tasks without explicit programming.
Core Concepts
Machine Intelligence
Machines' capabilities to learn from data and perform intelligent tasks.
Core Concepts
Generative AI
AI systems that produce new content — text, images, audio, video, or code — by learning the statistical distributions of training data and sampling from them, rather than retrieving stored outputs.
Techniques & Methods
Zero-Shot Learning
Model's ability to correctly perform tasks it was not explicitly trained for.
Techniques & Methods
Word Embedding
Technique representing words as dense vectors that capture semantic similarity.
Techniques & Methods
Vector Representation
Encoding words, sentences, or concepts as numerical vectors for AI comparison and retrieval.
Techniques & Methods
Variation
Different expressions or phrasings that convey the same underlying meaning.
Techniques & Methods
Validation
Evaluating model performance on data held separate from the training set.
Techniques & Methods
Upstream Sampling
Generating multiple candidate outputs and selecting the best based on predefined criteria.
Techniques & Methods
Transfer Learning
Leveraging knowledge learned from one task or domain to improve performance on a related one.
Techniques & Methods
Training
Teaching a model to make accurate predictions by exposing it to large datasets.
Techniques & Methods
Topic Modeling
Statistical method for discovering abstract topics within large document collections.
Techniques & Methods
Text Classification
Automatically assigning predefined categories to text documents.
Techniques & Methods
System Prompt
Internal instructions that guide an AI model's behavior, tone, and response style.
Techniques & Methods
Supervised Fine-Tuning
Refining a pre-trained model's performance on a specific task using labeled example data.
Techniques & Methods
Sequence Generation
Process where models produce sequences—such as words or tokens—based on learned patterns.
Techniques & Methods
Semantic Similarity
Measure of how closely related two pieces of text are in meaning.
Techniques & Methods
Semantic Search
Search technology that retrieves results based on the meaning of a query rather than exact keyword matches — using embeddings to represent queries and documents as vectors and finding nearest neighbors in semantic space.
Techniques & Methods
Semantic Annotation
Adding semantic metadata to content to improve AI understanding and processing.
Techniques & Methods
Self-Attention
Mechanism allowing a model to weigh the importance of each part of an input relative to all other parts.
Techniques & Methods
Scaling Laws
Empirical observations that larger models trained on more data predictably perform better.
Techniques & Methods
Retrieval Augmented Generation (RAG)
An inference-time architecture that retrieves relevant documents from a knowledge base or web index and injects them into a language model's context before generation, grounding answers in real source material.
Techniques & Methods
Response Quality
Evaluation of an AI response's relevance, coherence, accuracy, and helpfulness.
Techniques & Methods
Reinforcement Learning from Human Feedback (RLHF)
Training technique that refines AI models using feedback from human evaluators on output quality.
Techniques & Methods
Reinforcement Learning
An agent learns by taking actions in an environment and receiving rewards or penalties.
Techniques & Methods
Regularization
Techniques that prevent overfitting by penalizing model complexity during training.
Techniques & Methods
Query
A request for information or an action submitted to a database, search engine, or AI model.
Techniques & Methods
Proximal Policy Optimization (PPO)
RL algorithm that balances exploration and exploitation by constraining policy update size.
Techniques & Methods
Prompt Injection
Attack technique that manipulates AI behavior by embedding malicious instructions in inputs.
Techniques & Methods
Prompt Engineering
The discipline of designing input text — instructions, examples, constraints, and context — to reliably steer a language model toward accurate, well-formatted, and intent-aligned outputs without modifying model weights.
Techniques & Methods
Prompt
Text input provided to an AI model to guide the content and format of its response.
Techniques & Methods
Pre-training
Initial phase where a model learns general representations from large datasets before task-specific fine-tuning.
Techniques & Methods
Part-of-Speech Tagging (POS)
Labeling each word in text with its grammatical role such as noun, verb, or adjective.
Techniques & Methods
Overuse Penalty
Technique that discourages AI models from generating repetitive or overly similar responses.
Techniques & Methods
Online Learning
Model that updates its parameters continuously as new data arrives, rather than training in batches.
Techniques & Methods
One-Shot Learning
Model's ability to learn and make accurate predictions from only a single example.
Techniques & Methods
One-Shot / Few-Shot
Learning paradigms where models learn from one or very few examples to perform new tasks.
Techniques & Methods
Offline Reinforcement Learning
Learning optimal policies from fixed historical datasets without interacting with a live environment.
Techniques & Methods
Named Entity Recognition (NER)
Identifying and classifying named entities in text into predefined categories like people and places.
Techniques & Methods
Multitask Learning
Training a model on multiple related tasks simultaneously to improve performance on all of them.
Techniques & Methods
Masked Language Modeling
Training technique where the model predicts randomly hidden words in a sequence.
Techniques & Methods
Markov Decision Process
Mathematical framework modeling sequential decision-making in environments with probabilistic outcomes.
Techniques & Methods
Machine Translation
Software that automatically translates text or speech between languages.
Techniques & Methods
Low Rank Adaptation (LoRA)
Parameter-efficient fine-tuning technique that reduces compute and memory requirements for adapting large models.
Techniques & Methods
Linguistic Annotation
Adding linguistic metadata—such as POS tags, parse trees, or coreferences—to text for analysis.
Techniques & Methods
Knowledge Representation
Methods AI systems use to model, store, and reason over knowledge about the world.
Techniques & Methods
Joint Probability
The probability of two or more events occurring simultaneously.
Techniques & Methods
Information Extraction
Automatically extracting structured information from unstructured text.
Techniques & Methods
Inference
Using a trained AI model to generate predictions or responses on new, unseen data.
Techniques & Methods
Heuristics
Practical problem-solving approaches using rules of thumb rather than exhaustive search.
Techniques & Methods
Hallucination
When a language model generates confident-sounding text that is factually wrong, invented, or misattributed — a structural consequence of next-token prediction over learned patterns rather than retrieval from a verified knowledge base.
Techniques & Methods
Greedy Algorithms
Algorithms that make the locally optimal choice at each step to find a global solution.
Techniques & Methods
Generation
Producing new text, code, or content based on learned patterns and a given input prompt.
Techniques & Methods
Forward Chaining
Logical reasoning that starts with known facts and applies rules to derive conclusions.
Techniques & Methods
Fine-Tuning
Continuing the training of a pre-trained foundation model on a smaller, curated dataset to specialize its behavior, style, or domain expertise without losing its general capabilities.
Techniques & Methods
Fine-Grained Control
Capability to precisely adjust AI output characteristics, format, style, or content.
Techniques & Methods
Few-Shot Learning
Model's ability to generalize from only a handful of labeled examples.
Techniques & Methods
Feature Extraction
Identifying and isolating the most useful information from raw data for model training.
Techniques & Methods
Extractive Summarization
Creating summaries by selecting and combining key sentences directly from the source text.
Techniques & Methods
Evaluation Metrics
Quantitative measures used to assess how well an AI model performs on a task.
Techniques & Methods
Entity Extraction
Identifying and classifying named entities—people, places, organizations—within text.
Techniques & Methods
Entity Annotation
Labeling text spans with entity type information to create structured training data.
Techniques & Methods
Distributed Training
Spreading model training across multiple GPUs or servers to handle large-scale models and datasets.
Techniques & Methods
Dependency Parsing
Analyzing grammatical structure to identify dependency relationships between words in a sentence.
Techniques & Methods
Decoding Rules
Guidelines and algorithms that control how language models translate internal representations into output tokens.
Techniques & Methods
Data Mining
Examining large databases to discover patterns, correlations, and generate new insights.
Techniques & Methods
Data Augmentation
Increasing training dataset size and diversity by creating modified copies of existing data.
Techniques & Methods
Coreference Resolution
Determining which words or phrases in text refer to the same real-world entity.
Techniques & Methods
Completion
The output produced by an AI language model in response to a given input or prompt.
Techniques & Methods
Chain-of-Thought
A prompting and reasoning technique in which a language model is encouraged to produce step-by-step intermediate reasoning before its final answer — empirically improving accuracy on multi-step problems, especially math, logic, and code.
Techniques & Methods
Beam Search
Search algorithm that maintains multiple candidate sequences to find high-quality generated outputs.
Techniques & Methods
Bandit Optimization
Strategy balancing exploration of unknown options with exploitation of known high-reward choices.
Techniques & Methods
Backward Chaining
Goal-driven reasoning that works backward from a desired conclusion to find supporting facts.
Techniques & Methods
Backpropagation
Training algorithm that adjusts neural network weights by propagating prediction errors backward through the network.
Techniques & Methods
Autoregression
Statistical modeling approach where future values are predicted from past observed values.
Techniques & Methods
Attention Mechanism
Neural network technique enabling models to focus on the most relevant parts of input when producing each output.
Techniques & Methods
Attention
Core mechanism in transformers that dynamically weights the importance of different input positions.
Techniques & Methods
AI Alignment
The research field and engineering practice of building AI systems that reliably pursue goals humans actually want, remain controllable, and avoid harmful side effects — operationalized through RLHF, Constitutional AI, evaluations, and interpretability.
Techniques & Methods
Adversarial Training
Training AI models on challenging, adversarially crafted inputs to improve robustness and reliability.
Model Components
Transformers
Class of deep learning models based on self-attention that have revolutionized NLP and AI.
Model Components
Transformer Decoder
Transformer component that generates output sequences by attending to encoded inputs and prior outputs.
Model Components
Transformer
A neural-network architecture, introduced by Vaswani et al. in 2017, that uses self-attention and parallel computation across all sequence positions — the foundation under virtually every frontier language and multimodal model in production today.
Model Components
Sequence-to-Sequence (Seq2Seq) Models
Models that transform input sequences into output sequences, used in translation and summarization.
Model Components
Reward Models
Models trained to score AI outputs based on human preferences for use in reinforcement learning.
Model Components
Retrieval Model
Model that finds and returns the most relevant documents or passages from a large corpus given a query.
Model Components
Recurrent Neural Network (RNN)
Neural network with loops enabling it to maintain hidden state across sequential inputs.
Model Components
Predictive Model
A model that uses learned patterns to forecast unknown or future values.
Model Components
Parameter
A learnable variable within a model whose value is adjusted during training to minimize prediction error.
Model Components
Neural Network
Computational system of interconnected nodes inspired by the human brain that learns to recognize patterns.
Model Components
Large Language Model (LLM)
A transformer-based neural network with billions to trillions of parameters, trained on broad text corpora to predict the next token and able to generate, summarize, classify, and reason over natural language.
Model Components
Language Model
AI system that assigns probabilities to sequences of words and can generate coherent text.
Model Components
Model Card
Standardized documentation describing an AI model's intended uses, limitations, and evaluation results.
Model Components
Model Architecture
The specific structure of an AI model: its layers, connections, and component design.
Model Components
Model
A mathematical system trained on data to represent real-world patterns and make predictions.
Model Components
Maximum Response Length
The upper limit on the number of tokens a model can generate in a single response.
Model Components
Generative Pre-trained Transformer (GPT)
A family of decoder-only Transformer language models — pioneered by OpenAI — that combines large-scale unsupervised pre-training on text with task-specific alignment to produce general-purpose text generation.
Model Components
Generative Model
AI model that learns to generate new data instances resembling the training distribution.
Model Components
Generative Adversarial Network (GAN)
Framework training two competing networks—a generator and discriminator—to produce realistic synthetic data.
Model Components
Generator
GAN component that creates synthetic data instances designed to be indistinguishable from real data.
Model Components
Foundational Model
Large versatile model trained on broad data that serves as a base for diverse downstream applications.
Model Components
Encoder
Transformer component that processes input sequences into rich contextual representations.
Model Components
Embeddings
Dense numerical vectors that represent text, images, or other content in a high-dimensional space where semantically similar items are geometrically close — the foundational data structure for semantic search and RAG retrieval.
Model Components
Discriminator (in GAN)
GAN component that learns to distinguish real data from fake data generated by the generator.
Model Components
Context Window
The maximum number of tokens a language model can process in a single inference pass — everything the model "sees" at once, including system prompt, conversation history, retrieved documents, and the response being generated.
Model Components
Contextual Embeddings
Word representations that change based on surrounding context, unlike static word embeddings.
Model Components
Bounding Box
Rectangular region used to localize objects within images in computer vision tasks.
Model Components
Autoregressive Model
Model that generates each output element by conditioning on all previously generated elements.
Model Components
Artificial Neural Network
Computing system loosely inspired by biological neural networks, consisting of layers of connected nodes.
Model Components
API (Application Programming Interface)
Interface that allows software applications to communicate and share functionality with each other.
Model Components
GPT-3 (Generative Pre-trained Transformer 3)
OpenAI's 175-billion-parameter language model, released in 2020, that demonstrated remarkable few-shot learning.
Applications
User Interface (UI)
The means by which humans interact with a computer system or AI application.
Applications
Sentiment Analysis
Automatically identifying and categorizing expressed opinions in text to determine attitude.
Applications
QA (Question Answering)
AI system that automatically produces answers to human questions posed in natural language.
Applications
Predictive Analytics
Using historical data and ML models to forecast likely future outcomes.
Applications
Plugins / Tools
Extensions that allow AI systems to interact with external services, APIs, and data sources.
Applications
Multi-turn Dialogue
Conversations involving multiple exchanges where the AI maintains context across all prior turns.
Applications
Moderation Tools
Systems that monitor and filter AI outputs and user inputs to enforce content guidelines.
Applications
Enterprise AI
Application of AI technologies to improve business processes, efficiency, and decision-making.
Applications
Dialogue System
AI system designed to carry on natural, coherent conversations with human users.
Applications
CRM with AI
Customer relationship management systems augmented with AI to improve sales, service, and marketing outcomes.
Applications
ChatGPT
OpenAI's consumer conversational AI assistant, launched in November 2022, built on the GPT family of language models and trained with RLHF to follow instructions, maintain conversational context, and decline harmful requests.
Applications
Chatbot
Software application that simulates human conversation via text or voice interfaces.
Applications
AI Agents
AI systems that combine a language model with tools, memory, and planning to autonomously execute multi-step tasks — observing outcomes, deciding next actions, and iterating until a goal is reached.
Applications
InstructGPT
GPT variant fine-tuned with RLHF to follow instructions accurately and produce aligned responses.
Miscellaneous
Yeoman's Work
Diligent, thorough work that may be repetitive but is essential and dependable.
Miscellaneous
Vector Store
Specialized database for storing, indexing, and efficiently retrieving high-dimensional vector embeddings.
Miscellaneous
Validation Data
A held-out data split used during training to tune hyperparameters and monitor generalization.
Miscellaneous
Training Data
The labeled or unlabeled dataset used to fit a model's parameters during the learning process.
Miscellaneous
Test Data
A held-out dataset used only once at the end to evaluate final model performance unbiasedly.
Miscellaneous
System Message
Predefined instruction provided to an AI model before the conversation that guides its behavior.
Miscellaneous
Sandbox Environment
Isolated testing environment where code or AI models can run safely without affecting production systems.
Miscellaneous
Python
High-level programming language that is the dominant language for AI and machine learning development.
Miscellaneous
OpenAI
AI research organization that created GPT, ChatGPT, DALL-E, and Codex, and pioneered RLHF alignment.
Miscellaneous
Label
Annotation indicating the correct output or category for a training example in supervised learning.
Miscellaneous
Knowledge Base
Centralized repository of structured and unstructured information used to provide AI systems with domain knowledge.
Miscellaneous
Dataset
An organized collection of data examples prepared for training, evaluating, or testing AI models.
Miscellaneous
Data Science
Interdisciplinary field combining statistics, programming, and domain knowledge to extract insights from data.
Miscellaneous
Data Privacy
Practices and regulations ensuring personal and sensitive data is collected, stored, and processed responsibly.
Miscellaneous
Corpus
A large collection of text used for training language models or conducting linguistic research.
Miscellaneous
Deployment
The process of making a trained AI model available for real-world use in production environments.
General
AI Trainer
Specialist who improves AI models by providing structured feedback, creating training data, and evaluating outputs.
Ready to get cited by AI search engines?
Get a free GEO & AI visibility audit and see exactly where your brand stands across ChatGPT, Perplexity, and Google AI Overviews.
Claim your free audit
