Techniques & Methods

Semantic Similarity

Semantic similarity quantifies how much two texts share in meaning, regardless of the specific words used. It is computed by comparing their vector embeddings using cosine similarity or other distance metrics. High semantic similarity means the texts convey comparable information.

Applications include duplicate detection, question matching, semantic search, and retrieval in RAG systems. Models trained specifically for semantic similarity (like sentence-transformers) significantly outperform general LLMs for this task.

Authority Links

Semantic Similarity — Wikipedia

Measures and methods for computing semantic similarity.

Sentence-Transformers

Python library for state-of-the-art sentence and text embeddings.

Related Terms

Techniques & Methods

Word Embedding

Technique representing words as dense vectors that capture semantic similarity.

Model Components

Embeddings

Dense numerical vectors that represent text, images, or other content in a high-dimensional space where semantically similar items are geometrically close — the foundational data structure for semantic search and RAG retrieval.

Techniques & Methods

Semantic Search

Search technology that retrieves results based on the meaning of a query rather than exact keyword matches — using embeddings to represent queries and documents as vectors and finding nearest neighbors in semantic space.

Techniques & Methods

Vector Representation

Encoding words, sentences, or concepts as numerical vectors for AI comparison and retrieval.

Sequence Generation Semantic Search