25 changed files with 1445 additions and 41 deletions
@ -0,0 +1,167 @@
@@ -0,0 +1,167 @@
|
||||
[[vector-search]] |
||||
= Vector Search |
||||
|
||||
With the rise of Generative AI, Vector databases have gained strong traction in the world of databases. |
||||
These databases enable efficient storage and querying of high-dimensional vectors, making them well-suited for tasks such as semantic search, recommendation systems, and natural language understanding. |
||||
|
||||
Vector search is a technique that retrieves semantically similar data by comparing vector representations (also known as embeddings) rather than relying on traditional exact-match queries. |
||||
This approach enables intelligent, context-aware applications that go beyond keyword-based retrieval. |
||||
|
||||
In the context of Spring Data, vector search opens new possibilities for building intelligent, context-aware applications, particularly in domains like natural language processing, recommendation systems, and generative AI. |
||||
By modelling vector-based querying using familiar repository abstractions, Spring Data allows developers to seamlessly integrate similarity-based vector-capable databases with the simplicity and consistency of the Spring Data programming model. |
||||
|
||||
ifdef::vector-search-intro-include[] |
||||
include::{vector-search-intro-include}[] |
||||
endif::[] |
||||
|
||||
[[vector-search.model]] |
||||
== Vector Model |
||||
|
||||
To support vector search in a type-safe and idiomatic way, Spring Data introduces the following core abstractions: |
||||
|
||||
* <<vector-search.model.vector,`Vector`>> |
||||
* <<vector-search.model.search-result,`SearchResults<T>` and `SearchResult<T>`>> |
||||
* <<vector-search.model.scoring,`Score`, `Similarity` and Scoring Functions>> |
||||
|
||||
[[vector-search.model.vector]] |
||||
=== `Vector` |
||||
|
||||
The `Vector` type represents an n-dimensional numerical embedding, typically produced by embedding models. |
||||
In Spring Data, it is defined as a lightweight wrapper around an array of floating-point numbers, ensuring immutability and consistency. |
||||
This type can be used as an input for search queries or as a property on a domain entity to store the associated vector representation. |
||||
|
||||
==== |
||||
[source,java] |
||||
---- |
||||
Vector vector = Vector.of(0.23f, 0.11f, 0.77f); |
||||
---- |
||||
==== |
||||
|
||||
Using `Vector` in your domain model removes the need to work with raw arrays or lists of numbers, providing a more type-safe and expressive way to handle vector data. |
||||
This abstraction also allows for easy integration with various vector databases and libraries. |
||||
It also allows for implementing vendor-specific optimizations such as binary or quantized vectors that do not map to a standard floating point (`float` and `double` as of https://en.wikipedia.org/wiki/IEEE_754[IEEE 754]) representation. |
||||
A domain object can have a vector property, which can be used for similarity searches. |
||||
Consider the following example: |
||||
|
||||
ifdef::vector-search-model-include[] |
||||
include::{vector-search-model-include}[] |
||||
endif::[] |
||||
|
||||
NOTE: Associating a vector with a domain object results in the vector being loaded and stored as part of the entity lifecycle, which may introduce additional overhead on retrieval and persistence operations. |
||||
|
||||
[[vector-search.model.search-result]] |
||||
=== Search Results |
||||
|
||||
The `SearchResult<T>` type encapsulates the results of a vector similarity query. |
||||
It includes both the matched domain object and a relevance score that indicates how closely it matches the query vector. |
||||
This abstraction provides a structured way to handle result ranking and enables developers to easily work with both the data and its contextual relevance. |
||||
|
||||
ifdef::vector-search-repository-include[] |
||||
include::{vector-search-repository-include}[] |
||||
endif::[] |
||||
|
||||
In this example, the `searchByCountryAndEmbeddingNear` method returns a `SearchResults<Comment>` object, which contains a list of `SearchResult<Comment>` instances. |
||||
Each result includes the matched `Comment` entity and its relevance score. |
||||
|
||||
Relevance score is a numerical value that indicates how closely the matched vector aligns with the query vector. |
||||
Depending on whether a score represents distance or similarity a higher score can mean a closer match or a more distant one. |
||||
|
||||
The scoring function used to calculate this score can vary based on the underlying database, index or input parameters. |
||||
|
||||
[[vector-search.model.scoring]] |
||||
=== Score, Similarity, and Scoring Functions |
||||
|
||||
The `Score` type holds a numerical value indicating the relevance of a search result. |
||||
It can be used to rank results based on their similarity to the query vector. |
||||
The `Score` type is typically a floating-point number, and its interpretation (higher is better or lower is better) depends on the specific similarity function used. |
||||
Scores are a by-product of vector search and are not required for a successful search operation. |
||||
Score values are not part of a domain model and therefore represented best as out-of-band data. |
||||
|
||||
Generally, a Score is computed by a `ScoringFunction`. |
||||
The actual scoring function used to calculate this score can depends on the underlying database and can be obtained from a search index or input parameters. |
||||
|
||||
Spring Data supports declares constants for commonly used functions such as: |
||||
|
||||
Euclidean distance:: Calculates the straight-line distance in n-dimensional space involving the square root of the sum of squared differences. |
||||
Cosine similarity:: Measures the angle between two vectors by calculating the Dot product first and then normalizing its result by dividing by the product of their lengths. |
||||
Dot product:: Computes the sum of element-wise multiplications. |
||||
|
||||
The choice of similarity function can impact both the performance and semantics of the search and is often determined by the underlying database or index being used. |
||||
Spring Data adopts to the database's native scoring function capabilities and whether the score can be used to limit results. |
||||
|
||||
ifdef::vector-search-scoring-include[] |
||||
include::{vector-search-scoring-include}[] |
||||
endif::[] |
||||
|
||||
[[vector-search.methods]] |
||||
== Vector Search Methods |
||||
|
||||
Vector search methods are defined in repositories using the same conventions as standard Spring Data query methods. |
||||
These methods return `SearchResults<T>` and require a `Vector` parameter to define the query vector. |
||||
The actual implementation depends on the actual internals of the underlying data store and its capabilities around vector search. |
||||
|
||||
NOTE: If you are new to Spring Data repositories, make sure to familiarize yourself with the xref:repositories/core-concepts.adoc[basics of repository definitions and query methods]. |
||||
|
||||
Generally, you have the choice of declaring a search method using two approaches: |
||||
|
||||
* Query Derivation |
||||
* Declaring a String-based Query |
||||
|
||||
Generally, Vector Search methods must declare a `Vector` parameter to define the query vector. |
||||
|
||||
[[vector-search.method.derivation]] |
||||
=== Derived Search Methods |
||||
|
||||
A derived search method uses the name of the method to derive the query. |
||||
Vector Search supports the following keywords to run a Vector search when declaring a search method: |
||||
|
||||
.Query predicate keywords |
||||
[options="header",cols="1,3"] |
||||
|=============== |
||||
|Logical keyword|Keyword expressions |
||||
|`NEAR`|`Near`, `IsNear` |
||||
|`WITHIN`|`Within`, `IsWithin` |
||||
|=============== |
||||
|
||||
ifdef::vector-search-method-derived-include[] |
||||
include::{vector-search-method-derived-include}[] |
||||
endif::[] |
||||
|
||||
Derived search methods are typically easier to read and maintain, as they rely on the method name to express the query intent. |
||||
However, a derived search method requires either to declare a `Score`, `Range<Score>` or `ScoreFunction` as second argument to the `Near`/`Within` keyword to limit search results by their score. |
||||
|
||||
[[vector-search.method.string]] |
||||
=== Annotated Search Methods |
||||
|
||||
Annotated methods provide full control over the query semantics and parameters. |
||||
Unlike derived methods, they do not rely on method name conventions. |
||||
|
||||
ifdef::vector-search-method-annotated-include[] |
||||
include::{vector-search-method-annotated-include}[] |
||||
endif::[] |
||||
|
||||
With more control over the actual query, Spring Data can make fewer assumptions about the query and its parameters. |
||||
For example, `Similarity` normalization uses the native score function within the query to normalize the given similarity into a score predicate value and vice versa. |
||||
If an annotated query doesn't define e.g. the score, then the score value in the returned `SearchResult<T>` will be zero. |
||||
|
||||
[[vector-search.method.sorting]] |
||||
=== Sorting |
||||
|
||||
By default, search results are ordered according to their score. |
||||
You can override sorting by using the `Sort` parameter: |
||||
|
||||
.Using `Sort` in Repository Search Methods |
||||
==== |
||||
[source,java] |
||||
---- |
||||
interface CommentRepository extends Repository<Comment, String> { |
||||
|
||||
SearchResults<Comment> searchByEmbeddingNearOrderByCountry(Vector vector, Score score); |
||||
|
||||
SearchResults<Comment> searchByEmbeddingWithin(Vector vector, Score score, Sort sort); |
||||
} |
||||
---- |
||||
==== |
||||
|
||||
Please note that custom sorting does not allow expressing the score as a sorting criteria. |
||||
You can only refer to domain properties. |
||||
@ -0,0 +1,118 @@
@@ -0,0 +1,118 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import java.io.Serializable; |
||||
|
||||
import org.springframework.util.ObjectUtils; |
||||
|
||||
/** |
||||
* Value object representing a search result score computed via a {@link ScoringFunction}. |
||||
* <p> |
||||
* Encapsulates the numeric score and the scoring function used to derive it. Scores are primarily used to rank search |
||||
* results. Depending on the used {@link ScoringFunction} higher scores can indicate either a higher distance or a |
||||
* higher similarity. Use the {@link Similarity} class to indicate usage of a normalized score across representing |
||||
* effectively the similarity. |
||||
* <p> |
||||
* Instances of this class are immutable and suitable for use in comparison, sorting, and range operations. |
||||
* |
||||
* @author Mark Paluch |
||||
* @since 4.0 |
||||
* @see Similarity |
||||
*/ |
||||
public sealed class Score implements Serializable permits Similarity { |
||||
|
||||
private final double value; |
||||
private final ScoringFunction function; |
||||
|
||||
Score(double value, ScoringFunction function) { |
||||
this.value = value; |
||||
this.function = function; |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@link Score} from a plain {@code score} value using {@link ScoringFunction#unspecified()}. |
||||
* |
||||
* @param score the score value without a specific {@link ScoringFunction}. |
||||
* @return the new {@link Score}. |
||||
*/ |
||||
public static Score of(double score) { |
||||
return of(score, ScoringFunction.unspecified()); |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@link Score} from a {@code score} value using the given {@link ScoringFunction}. |
||||
* |
||||
* @param score the score value. |
||||
* @param function the scoring function that has computed the {@code score}. |
||||
* @return the new {@link Score}. |
||||
*/ |
||||
public static Score of(double score, ScoringFunction function) { |
||||
return new Score(score, function); |
||||
} |
||||
|
||||
/** |
||||
* Creates a {@link Range} from the given minimum and maximum {@code Score} values. |
||||
* |
||||
* @param min the lower score value, must not be {@literal null}. |
||||
* @param max the upper score value, must not be {@literal null}. |
||||
* @return a {@link Range} over {@link Score} bounds. |
||||
*/ |
||||
public static Range<Score> between(Score min, Score max) { |
||||
return Range.from(Range.Bound.inclusive(min)).to(Range.Bound.inclusive(max)); |
||||
} |
||||
|
||||
/** |
||||
* Returns the raw numeric value of the score. |
||||
* |
||||
* @return the score value. |
||||
*/ |
||||
public double getValue() { |
||||
return value; |
||||
} |
||||
|
||||
/** |
||||
* Returns the {@link ScoringFunction} that was used to compute this score. |
||||
* |
||||
* @return the associated scoring function. |
||||
*/ |
||||
public ScoringFunction getFunction() { |
||||
return function; |
||||
} |
||||
|
||||
@Override |
||||
public boolean equals(Object o) { |
||||
if (!(o instanceof Score other)) { |
||||
return false; |
||||
} |
||||
if (value != other.value) { |
||||
return false; |
||||
} |
||||
return ObjectUtils.nullSafeEquals(function, other.function); |
||||
} |
||||
|
||||
@Override |
||||
public int hashCode() { |
||||
return ObjectUtils.nullSafeHash(value, function); |
||||
} |
||||
|
||||
@Override |
||||
public String toString() { |
||||
return function instanceof UnspecifiedScoringFunction ? Double.toString(value) |
||||
: "%s (%s)".formatted(Double.toString(value), function.getName()); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,87 @@
@@ -0,0 +1,87 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
/** |
||||
* Strategy interface for scoring functions. |
||||
* <p> |
||||
* Implementations define how score (distance or similarity) between two vectors is computed, allowing control over |
||||
* ranking behavior in search queries. |
||||
* <p> |
||||
* Provides commonly used scoring variants via static factory methods. See {@link VectorScoringFunctions} for the |
||||
* concrete implementations. |
||||
* |
||||
* @author Mark Paluch |
||||
* @since 4.0 |
||||
* @see Score |
||||
* @see Similarity |
||||
*/ |
||||
public interface ScoringFunction { |
||||
|
||||
/** |
||||
* Returns the default {@code ScoringFunction} to be used when none is explicitly specified. |
||||
* <p> |
||||
* This is typically used to indicate the absence of a scoring definition. |
||||
* |
||||
* @return the default {@code ScoringFunction} instance. |
||||
*/ |
||||
static ScoringFunction unspecified() { |
||||
return UnspecifiedScoringFunction.INSTANCE; |
||||
} |
||||
|
||||
/** |
||||
* Return the Euclidean distance scoring function. |
||||
* <p> |
||||
* Calculates the L2 norm (straight-line distance) between two vectors. |
||||
* |
||||
* @return the {@code ScoringFunction} based on Euclidean distance. |
||||
*/ |
||||
static ScoringFunction euclidean() { |
||||
return VectorScoringFunctions.EUCLIDEAN; |
||||
} |
||||
|
||||
/** |
||||
* Return the cosine similarity scoring function. |
||||
* <p> |
||||
* Measures the cosine of the angle between two vectors, independent of magnitude. |
||||
* |
||||
* @return the {@code ScoringFunction} based on cosine similarity. |
||||
*/ |
||||
static ScoringFunction cosine() { |
||||
return VectorScoringFunctions.COSINE; |
||||
} |
||||
|
||||
/** |
||||
* Return the dot product (also known as inner product) scoring function. |
||||
* <p> |
||||
* Computes the algebraic product of two vectors, considering both direction and magnitude. |
||||
* |
||||
* @return the {@code ScoringFunction} based on dot product. |
||||
*/ |
||||
static ScoringFunction dotProduct() { |
||||
return VectorScoringFunctions.DOT_PRODUCT; |
||||
} |
||||
|
||||
/** |
||||
* Return the name of the scoring function. |
||||
* <p> |
||||
* Typically used for display or configuration purposes. |
||||
* |
||||
* @return the identifying name of this scoring function. |
||||
*/ |
||||
String getName(); |
||||
|
||||
} |
||||
@ -0,0 +1,128 @@
@@ -0,0 +1,128 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import java.io.Serial; |
||||
import java.io.Serializable; |
||||
import java.util.function.Function; |
||||
|
||||
import org.jspecify.annotations.Nullable; |
||||
|
||||
import org.springframework.util.Assert; |
||||
import org.springframework.util.ObjectUtils; |
||||
|
||||
/** |
||||
* Immutable value object representing a search result consisting of a content item and an associated {@link Score}. |
||||
* <p> |
||||
* Typically used in the context of similarity-based or vector search operations where each result carries a relevance |
||||
* {@link Score}. Provides accessor methods for the content and its score, along with transformation support via |
||||
* {@link #map(Function)}. |
||||
* |
||||
* @param <T> the type of the content object |
||||
* @author Mark Paluch |
||||
* @since 4.0 |
||||
* @see Score |
||||
* @see Similarity |
||||
*/ |
||||
public final class SearchResult<T> implements Serializable { |
||||
|
||||
private static final @Serial long serialVersionUID = 1637452570977581370L; |
||||
|
||||
private final T content; |
||||
private final Score score; |
||||
|
||||
/** |
||||
* Creates a new {@link SearchResult} with the given content and {@link Score}. |
||||
* |
||||
* @param content the result content, must not be {@literal null}. |
||||
* @param score the result score, must not be {@literal null}. |
||||
*/ |
||||
public SearchResult(T content, Score score) { |
||||
|
||||
Assert.notNull(content, "Content must not be null"); |
||||
Assert.notNull(score, "Score must not be null"); |
||||
|
||||
this.content = content; |
||||
this.score = score; |
||||
} |
||||
|
||||
/** |
||||
* Create a new {@link SearchResult} with the given content and a raw score value. |
||||
* |
||||
* @param content the result content, must not be {@literal null}. |
||||
* @param score the score value. |
||||
*/ |
||||
public SearchResult(T content, double score) { |
||||
this(content, Score.of(score)); |
||||
} |
||||
|
||||
/** |
||||
* Returns the content associated with this result. |
||||
*/ |
||||
public T getContent() { |
||||
return this.content; |
||||
} |
||||
|
||||
/** |
||||
* Returns the {@link Score} associated with this result. |
||||
*/ |
||||
public Score getScore() { |
||||
return this.score; |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@link SearchResult} by applying the given mapping {@link Function} to this result's content. |
||||
* |
||||
* @param converter the mapping function to apply to the content, must not be {@literal null}. |
||||
* @return a new {@link SearchResult} instance with converted content. |
||||
* @param <U> the target type of the mapped content. |
||||
*/ |
||||
public <U> SearchResult<U> map(Function<? super T, ? extends U> converter) { |
||||
|
||||
Assert.notNull(converter, "Function must not be null"); |
||||
|
||||
return new SearchResult<>(converter.apply(getContent()), getScore()); |
||||
} |
||||
|
||||
@Override |
||||
public boolean equals(@Nullable Object o) { |
||||
|
||||
if (this == o) { |
||||
return true; |
||||
} |
||||
|
||||
if (!(o instanceof SearchResult<?> result)) { |
||||
return false; |
||||
} |
||||
|
||||
if (!ObjectUtils.nullSafeEquals(content, result.content)) { |
||||
return false; |
||||
} |
||||
|
||||
return ObjectUtils.nullSafeEquals(score, result.score); |
||||
} |
||||
|
||||
@Override |
||||
public int hashCode() { |
||||
return ObjectUtils.nullSafeHash(content, score); |
||||
} |
||||
|
||||
@Override |
||||
public String toString() { |
||||
return String.format("SearchResult [content: %s, score: %s]", content, score); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,130 @@
@@ -0,0 +1,130 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import java.io.Serializable; |
||||
import java.util.Collections; |
||||
import java.util.Iterator; |
||||
import java.util.List; |
||||
import java.util.function.Function; |
||||
import java.util.stream.Collectors; |
||||
import java.util.stream.Stream; |
||||
|
||||
import org.springframework.data.util.Streamable; |
||||
import org.springframework.util.Assert; |
||||
import org.springframework.util.ObjectUtils; |
||||
import org.springframework.util.StringUtils; |
||||
|
||||
/** |
||||
* Value object encapsulating a collection of {@link SearchResult} instances. |
||||
* <p> |
||||
* Typically used as the result type for search or similarity queries, exposing access to the result content and |
||||
* supporting mapping operations to transform the result content type. |
||||
* |
||||
* @param <T> the type of content contained within each {@link SearchResult}. |
||||
* @author Mark Paluch |
||||
* @since 4.0 |
||||
* @see SearchResult |
||||
*/ |
||||
public class SearchResults<T> implements Iterable<SearchResult<T>>, Serializable { |
||||
|
||||
private final List<? extends SearchResult<T>> results; |
||||
|
||||
/** |
||||
* Creates a new {@link SearchResults} instance from the given list of {@link SearchResult} items. |
||||
* |
||||
* @param results the search results to encapsulate, must not be {@code null} |
||||
*/ |
||||
public SearchResults(List<? extends SearchResult<T>> results) { |
||||
this.results = results; |
||||
} |
||||
|
||||
/** |
||||
* Return the actual content of the {@link SearchResult} items as an unmodifiable list. |
||||
*/ |
||||
public List<SearchResult<T>> getContent() { |
||||
return Collections.unmodifiableList(results); |
||||
} |
||||
|
||||
@Override |
||||
@SuppressWarnings("unchecked") |
||||
public Iterator<SearchResult<T>> iterator() { |
||||
return (Iterator<SearchResult<T>>) results.iterator(); |
||||
} |
||||
|
||||
/** |
||||
* Returns a sequential {@link Stream} containing {@link SearchResult} items in this {@code SearchResults} instance. |
||||
* |
||||
* @return a sequential {@link Stream} containing {@link SearchResult} items in this {@code SearchResults} instance. |
||||
*/ |
||||
public Stream<SearchResult<T>> stream() { |
||||
return Streamable.of(this).stream(); |
||||
} |
||||
|
||||
/** |
||||
* Returns a sequential {@link Stream} containing {@link #getContent() unwrapped content} items in this |
||||
* {@code SearchResults} instance. |
||||
* |
||||
* @return a sequential {@link Stream} containing {@link #getContent() unwrapped content} items in this |
||||
* {@code SearchResults} instance. |
||||
*/ |
||||
public Stream<T> contentStream() { |
||||
return getContent().stream().map(SearchResult::getContent); |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@code SearchResults} instance with the content of the current results mapped via the given |
||||
* {@link Function}. |
||||
* |
||||
* @param converter the mapping function to apply to the content of each {@link SearchResult}, must not be |
||||
* {@literal null}. |
||||
* @param <U> the target type of the mapped content. |
||||
* @return a new {@code SearchResults} instance containing mapped result content. |
||||
*/ |
||||
public <U> SearchResults<U> map(Function<? super T, ? extends U> converter) { |
||||
|
||||
Assert.notNull(converter, "Function must not be null"); |
||||
|
||||
List<SearchResult<U>> result = results.stream().map(it -> it.<U> map(converter)).collect(Collectors.toList()); |
||||
|
||||
return new SearchResults<>(result); |
||||
} |
||||
|
||||
@Override |
||||
public boolean equals(Object o) { |
||||
|
||||
if (o == this) { |
||||
return true; |
||||
} |
||||
|
||||
if (!(o instanceof SearchResults<?> that)) { |
||||
return false; |
||||
} |
||||
return ObjectUtils.nullSafeEquals(results, that.results); |
||||
} |
||||
|
||||
@Override |
||||
public int hashCode() { |
||||
return ObjectUtils.nullSafeHashCode(results); |
||||
} |
||||
|
||||
@Override |
||||
public String toString() { |
||||
return results.isEmpty() ? "SearchResults: [empty]" |
||||
: String.format("SearchResults: [results: %s]", StringUtils.collectionToCommaDelimitedString(results)); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,133 @@
@@ -0,0 +1,133 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import org.springframework.util.Assert; |
||||
|
||||
/** |
||||
* Value object representing a normalized similarity score determined by a {@link ScoringFunction}. |
||||
* <p> |
||||
* Similarity values are constrained to the range {@code [0.0, 1.0]}, where {@code 0.0} denotes the least similarity and |
||||
* {@code 1.0} the maximum similarity. This normalization allows for consistent comparison of similarity scores across |
||||
* different scoring models and systems. |
||||
* <p> |
||||
* Primarily used in vector search and approximate nearest neighbor arrangements where results are ranked based on |
||||
* normalized relevance. Vector searches typically return a collection of results ordered by their similarity to the |
||||
* query vector. |
||||
* <p> |
||||
* This class is designed for use in information retrieval contexts, recommendation systems, and other applications |
||||
* requiring normalized comparison of results. |
||||
* <p> |
||||
* A {@code Similarity} instance includes both the similarity {@code value} and information about the |
||||
* {@link ScoringFunction} used to generate it, providing context for proper interpretation of the score. |
||||
* <p> |
||||
* Instances are immutable and support range-based comparisons, making them suitable for filtering and ranking |
||||
* operations. The class extends {@link Score} to inherit common scoring functionality while adding similarity-specific |
||||
* semantics. |
||||
* |
||||
* @author Mark Paluch |
||||
* @since 4.0 |
||||
* @see Score |
||||
*/ |
||||
public final class Similarity extends Score { |
||||
|
||||
private Similarity(double value, ScoringFunction function) { |
||||
super(value, function); |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@link Similarity} from a plain {@code similarity} value using {@link ScoringFunction#unspecified()}. |
||||
* |
||||
* @param similarity the similarity value without a specific {@link ScoringFunction}, ranging between {@code 0} and |
||||
* {@code 1}. |
||||
* @return the new {@link Similarity}. |
||||
*/ |
||||
public static Similarity of(double similarity) { |
||||
return of(similarity, ScoringFunction.unspecified()); |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@link Similarity} from a raw value and the associated {@link ScoringFunction}. |
||||
* |
||||
* @param similarity the similarity value in the {@code [0.0, 1.0]} range. |
||||
* @param function the scoring function that produced this similarity. |
||||
* @return a new {@link Similarity} instance. |
||||
* @throws IllegalArgumentException if the value is outside the allowed range. |
||||
*/ |
||||
public static Similarity of(double similarity, ScoringFunction function) { |
||||
|
||||
Assert.isTrue(similarity >= 0.0 && similarity <= 1.0, "Similarity must be in [0,1] range."); |
||||
|
||||
return new Similarity(similarity, function); |
||||
} |
||||
|
||||
/** |
||||
* Create a raw {@link Similarity} value without validation. |
||||
* <p> |
||||
* Intended for use when accepting similarity values from trusted sources such as search engines or databases. |
||||
* |
||||
* @param similarity the similarity value in the {@code [0.0, 1.0]} range. |
||||
* @param function the scoring function that produced this similarity. |
||||
* @return a new {@link Similarity} instance. |
||||
*/ |
||||
public static Similarity raw(double similarity, ScoringFunction function) { |
||||
return new Similarity(similarity, function); |
||||
} |
||||
|
||||
/** |
||||
* Creates a {@link Range} between the given {@link Similarity}. |
||||
* |
||||
* @param min lower value. |
||||
* @param max upper value. |
||||
* @return the {@link Range} between the given values. |
||||
*/ |
||||
public static Range<Similarity> between(Similarity min, Similarity max) { |
||||
return Range.from(Range.Bound.inclusive(min)).to(Range.Bound.inclusive(max)); |
||||
} |
||||
|
||||
/** |
||||
* Creates a new {@link Range} by creating minimum and maximum {@link Similarity} from the given values |
||||
* {@link ScoringFunction#unspecified() without specifying} a specific scoring function. |
||||
* |
||||
* @param minValue lower value, ranging between {@code 0} and {@code 1}. |
||||
* @param maxValue upper value, ranging between {@code 0} and {@code 1}. |
||||
* @return the {@link Range} between the given values. |
||||
*/ |
||||
public static Range<Similarity> between(double minValue, double maxValue) { |
||||
return between(minValue, maxValue, ScoringFunction.unspecified()); |
||||
} |
||||
|
||||
/** |
||||
* Creates a {@link Range} of {@link Similarity} values using raw values and a specified scoring function. |
||||
* |
||||
* @param minValue the lower similarity value. |
||||
* @param maxValue the upper similarity value. |
||||
* @param function the scoring function to associate with the values. |
||||
* @return a {@link Range} of {@link Similarity} values. |
||||
*/ |
||||
public static Range<Similarity> between(double minValue, double maxValue, ScoringFunction function) { |
||||
return between(Similarity.of(minValue, function), Similarity.of(maxValue, function)); |
||||
} |
||||
|
||||
@Override |
||||
public boolean equals(Object o) { |
||||
if (!(o instanceof Similarity other)) { |
||||
return false; |
||||
} |
||||
return super.equals(other); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,46 @@
@@ -0,0 +1,46 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import java.io.Serializable; |
||||
|
||||
class UnspecifiedScoringFunction implements ScoringFunction, Serializable { |
||||
|
||||
static final UnspecifiedScoringFunction INSTANCE = new UnspecifiedScoringFunction(); |
||||
|
||||
private UnspecifiedScoringFunction() {} |
||||
|
||||
@Override |
||||
public String getName() { |
||||
return "Unspecified"; |
||||
} |
||||
|
||||
@Override |
||||
public boolean equals(Object o) { |
||||
return o instanceof UnspecifiedScoringFunction; |
||||
} |
||||
|
||||
@Override |
||||
public int hashCode() { |
||||
return 32; |
||||
} |
||||
|
||||
@Override |
||||
public String toString() { |
||||
return "UNSPECIFIED"; |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,92 @@
@@ -0,0 +1,92 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
/** |
||||
* Commonly used {@link ScoringFunction} implementations for vector-based similarity computations. |
||||
* <p> |
||||
* Provides a set of standard scoring strategies for comparing vectors in search or matching operations. Includes |
||||
* options such as Euclidean distance, cosine similarity, and dot product. |
||||
* <p> |
||||
* These constants are intended for reuse across components requiring vector scoring semantics. Each scoring function |
||||
* represents a mathematical approach to quantifying the similarity or distance between vectors in a multidimensional |
||||
* space. |
||||
* <p> |
||||
* When selecting a scoring function, consider the specific requirements of your application domain: |
||||
* <ul> |
||||
* <li>For spatial distance measurements where magnitude matters, use {@link #EUCLIDEAN}.</li> |
||||
* <li>For directional similarity irrespective of magnitude, use {@link #COSINE}.</li> |
||||
* <li>For efficient high-dimensional calculations, use {@link #DOT_PRODUCT}.</li> |
||||
* <li>For grid-based or axis-aligned problems, use {@link #TAXICAB}.</li> |
||||
* <li>For binary vector or string comparisons, use {@link #HAMMING}.</li> |
||||
* </ul> |
||||
* The choice of scoring function can significantly impact the relevance of the results returned by a Vector Search |
||||
* query. {@code ScoringFunction} and score values are typically subject to fine-tuning during the development to |
||||
* achieve optimal performance and accuracy. |
||||
* |
||||
* @author Mark Paluch |
||||
* @since 4.0 |
||||
*/ |
||||
public enum VectorScoringFunctions implements ScoringFunction { |
||||
|
||||
/** |
||||
* Scoring based on the <a href="https://en.wikipedia.org/wiki/Euclidean_distance">Euclidean distance</a> between two |
||||
* vectors. |
||||
* <p> |
||||
* Computes the L2 norm, involving a square root operation. Typically more computationally expensive than |
||||
* {@link #COSINE} or {@link #DOT_PRODUCT}, but precise in spatial distance measurement. |
||||
*/ |
||||
EUCLIDEAN, |
||||
|
||||
/** |
||||
* Scoring based on <a href="https://en.wikipedia.org/wiki/Cosine_distance">cosine similarity</a> between two vectors. |
||||
* <p> |
||||
* Measures the angle between vectors, independent of their magnitude. Involves a {@link #DOT_PRODUCT} and |
||||
* normalization, offering a balance between precision and performance. |
||||
*/ |
||||
COSINE, |
||||
|
||||
/** |
||||
* Scoring based on the <a href="https://en.wikipedia.org/wiki/Dot_product">dot product</a> (also known as inner |
||||
* product) between two vectors. |
||||
* <p> |
||||
* Efficient to compute and particularly useful in high-dimensional vector spaces. |
||||
*/ |
||||
DOT_PRODUCT, |
||||
|
||||
/** |
||||
* Scoring based on <a href="https://en.wikipedia.org/wiki/Taxicab_geometry">taxicab (Manhattan) distance</a>. |
||||
* <p> |
||||
* Computes the sum of absolute differences across dimensions. Useful in contexts where axis-aligned movement or L1 |
||||
* norms are preferred. |
||||
*/ |
||||
TAXICAB, |
||||
|
||||
/** |
||||
* Scoring based on the <a href="https://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a> between two |
||||
* vectors or strings. |
||||
* <p> |
||||
* Counts the number of differing positions. Suitable for binary (bitwise) vectors or fixed-length character |
||||
* sequences. |
||||
*/ |
||||
HAMMING; |
||||
|
||||
@Override |
||||
public String getName() { |
||||
return name(); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,69 @@
@@ -0,0 +1,69 @@
|
||||
/* |
||||
* Copyright 2011-2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import static org.assertj.core.api.Assertions.*; |
||||
|
||||
import org.junit.jupiter.api.Test; |
||||
|
||||
import org.springframework.util.SerializationUtils; |
||||
|
||||
/** |
||||
* Unit tests for {@link SearchResult}. |
||||
* |
||||
* @author Mark Paluch |
||||
*/ |
||||
class SearchResultUnitTests { |
||||
|
||||
SearchResult<String> first = new SearchResult<>("Foo", Score.of(2.5)); |
||||
SearchResult<String> second = new SearchResult<>("Foo", Score.of(2.5)); |
||||
SearchResult<String> third = new SearchResult<>("Bar", Score.of(2.5)); |
||||
SearchResult<String> fourth = new SearchResult<>("Foo", Score.of(5.2)); |
||||
|
||||
@Test // GH-
|
||||
void considersSameInstanceEqual() { |
||||
assertThat(first.equals(first)).isTrue(); |
||||
} |
||||
|
||||
@Test // GH-
|
||||
void considersSameValuesAsEqual() { |
||||
|
||||
assertThat(first.equals(second)).isTrue(); |
||||
assertThat(second.equals(first)).isTrue(); |
||||
assertThat(first.equals(third)).isFalse(); |
||||
assertThat(third.equals(first)).isFalse(); |
||||
assertThat(first.equals(fourth)).isFalse(); |
||||
assertThat(fourth.equals(first)).isFalse(); |
||||
} |
||||
|
||||
@Test |
||||
@SuppressWarnings({ "rawtypes", "unchecked" }) |
||||
// GH-
|
||||
void rejectsNullContent() { |
||||
assertThatIllegalArgumentException().isThrownBy(() -> new SearchResult(null, Score.of(2.5))); |
||||
} |
||||
|
||||
@Test // GH-
|
||||
@SuppressWarnings("unchecked") |
||||
void testSerialization() { |
||||
|
||||
var result = new SearchResult<>("test", Score.of(2d)); |
||||
|
||||
var serialized = (SearchResult<String>) SerializationUtils.deserialize(SerializationUtils.serialize(result)); |
||||
assertThat(serialized).isEqualTo(result); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,69 @@
@@ -0,0 +1,69 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import static org.assertj.core.api.Assertions.*; |
||||
|
||||
import java.util.Arrays; |
||||
import java.util.Collections; |
||||
import java.util.List; |
||||
|
||||
import org.junit.jupiter.api.Test; |
||||
|
||||
import org.springframework.util.SerializationUtils; |
||||
|
||||
/** |
||||
* Unit tests for {@link SearchResults}. |
||||
* |
||||
* @author Mark Paluch |
||||
*/ |
||||
class SearchResultsUnitTests { |
||||
|
||||
@SuppressWarnings("unchecked") |
||||
@Test // GH-
|
||||
void testSerialization() { |
||||
|
||||
var result = new SearchResult<>("test", Score.of(2)); |
||||
var searchResults = new SearchResults<>(Collections.singletonList(result)); |
||||
|
||||
var serialized = (SearchResults<String>) SerializationUtils |
||||
.deserialize(SerializationUtils.serialize(searchResults)); |
||||
assertThat(serialized).isEqualTo(searchResults); |
||||
} |
||||
|
||||
@SuppressWarnings("unchecked") |
||||
@Test // GH-
|
||||
void testStream() { |
||||
|
||||
var result = new SearchResult<>("test", Score.of(2)); |
||||
var searchResults = new SearchResults<>(Collections.singletonList(result)); |
||||
|
||||
List<SearchResult<String>> list = searchResults.stream().toList(); |
||||
assertThat(list).isEqualTo(searchResults.getContent()); |
||||
} |
||||
|
||||
@SuppressWarnings("unchecked") |
||||
@Test // GH-
|
||||
void testContentStream() { |
||||
|
||||
var result = new SearchResult<>("test", Score.of(2)); |
||||
var searchResults = new SearchResults<>(Collections.singletonList(result)); |
||||
|
||||
List<String> list = searchResults.contentStream().toList(); |
||||
assertThat(list).isEqualTo(Arrays.asList(result.getContent())); |
||||
} |
||||
|
||||
} |
||||
@ -0,0 +1,89 @@
@@ -0,0 +1,89 @@
|
||||
/* |
||||
* Copyright 2025 the original author or authors. |
||||
* |
||||
* Licensed under the Apache License, Version 2.0 (the "License"); |
||||
* you may not use this file except in compliance with the License. |
||||
* You may obtain a copy of the License at |
||||
* |
||||
* https://www.apache.org/licenses/LICENSE-2.0
|
||||
* |
||||
* Unless required by applicable law or agreed to in writing, software |
||||
* distributed under the License is distributed on an "AS IS" BASIS, |
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
* See the License for the specific language governing permissions and |
||||
* limitations under the License. |
||||
*/ |
||||
package org.springframework.data.domain; |
||||
|
||||
import static org.assertj.core.api.Assertions.*; |
||||
|
||||
import org.junit.jupiter.api.Test; |
||||
|
||||
/** |
||||
* Unit tests for {@link Similarity}. |
||||
* |
||||
* @author Mark Paluch |
||||
*/ |
||||
class SimilarityUnitTests { |
||||
|
||||
@Test |
||||
void shouldBeBounded() { |
||||
|
||||
assertThatIllegalArgumentException().isThrownBy(() -> Similarity.of(-1)); |
||||
assertThatIllegalArgumentException().isThrownBy(() -> Similarity.of(1.01)); |
||||
} |
||||
|
||||
@Test |
||||
void shouldConstructRawSimilarity() { |
||||
|
||||
Similarity similarity = Similarity.raw(2, ScoringFunction.unspecified()); |
||||
|
||||
assertThat(similarity.getValue()).isEqualTo(2); |
||||
} |
||||
|
||||
@Test |
||||
void shouldConstructGenericSimilarity() { |
||||
|
||||
Similarity similarity = Similarity.of(1); |
||||
|
||||
assertThat(similarity).isEqualTo(Similarity.of(1)).isNotEqualTo(Score.of(1)).isNotEqualTo(Similarity.of(0.5)); |
||||
assertThat(similarity).hasToString("1.0"); |
||||
assertThat(similarity.getFunction()).isEqualTo(ScoringFunction.unspecified()); |
||||
} |
||||
|
||||
@Test |
||||
void shouldConstructMeteredSimilarity() { |
||||
|
||||
Similarity similarity = Similarity.of(1, VectorScoringFunctions.COSINE); |
||||
|
||||
assertThat(similarity).isEqualTo(Similarity.of(1, VectorScoringFunctions.COSINE)) |
||||
.isNotEqualTo(Score.of(1, VectorScoringFunctions.COSINE)).isNotEqualTo(Similarity.of(1)); |
||||
assertThat(similarity).hasToString("1.0 (COSINE)"); |
||||
assertThat(similarity.getFunction()).isEqualTo(VectorScoringFunctions.COSINE); |
||||
} |
||||
|
||||
@Test |
||||
void shouldConstructRange() { |
||||
|
||||
Range<Similarity> range = Similarity.between(0.5, 1); |
||||
|
||||
assertThat(range.getLowerBound().getValue()).contains(Similarity.of(0.5)); |
||||
assertThat(range.getLowerBound().isInclusive()).isTrue(); |
||||
|
||||
assertThat(range.getUpperBound().getValue()).contains(Similarity.of(1)); |
||||
assertThat(range.getUpperBound().isInclusive()).isTrue(); |
||||
} |
||||
|
||||
@Test |
||||
void shouldConstructRangeWithFunction() { |
||||
|
||||
Range<Similarity> range = Similarity.between(0.5, 1, VectorScoringFunctions.COSINE); |
||||
|
||||
assertThat(range.getLowerBound().getValue()).contains(Similarity.of(0.5, VectorScoringFunctions.COSINE)); |
||||
assertThat(range.getLowerBound().isInclusive()).isTrue(); |
||||
|
||||
assertThat(range.getUpperBound().getValue()).contains(Similarity.of(1, VectorScoringFunctions.COSINE)); |
||||
assertThat(range.getUpperBound().isInclusive()).isTrue(); |
||||
} |
||||
|
||||
} |
||||
Loading…
Reference in new issue