Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for builder constructor in neural query builder #1047

1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Support new knn query parameter expand_nested ([#1013](https://github.com/opensearch-project/neural-search/pull/1013))
- Implement pruning for neural sparse ingestion pipeline and two phase search processor ([#988](https://github.com/opensearch-project/neural-search/pull/988))
- Support empty string for fields in text embedding processor ([#1041](https://github.com/opensearch-project/neural-search/pull/1041))
- Support for builder constructor in Neural Query Builder ([#1047](https://github.com/opensearch-project/neural-search/pull/1047))
### Bug Fixes
- Address inconsistent scoring in hybrid query results ([#998](https://github.com/opensearch-project/neural-search/pull/998))
### Infrastructure
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,12 @@ private HybridQueryBuilder getQueryBuilder(
final Map<String, ?> methodParameters,
final RescoreContext rescoreContext
) {
NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder();
neuralQueryBuilder.fieldName("passage_embedding");
neuralQueryBuilder.modelId(modelId);
neuralQueryBuilder.queryText(QUERY);
neuralQueryBuilder.k(5);
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder()
.fieldName("passage_embedding")
.modelId(modelId)
.queryText(QUERY)
.k(5)
.build();
if (expandNestedDocs != null) {
neuralQueryBuilder.expandNested(expandNestedDocs);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@
@Getter
@Setter
@Accessors(chain = true, fluent = true)
@NoArgsConstructor
@AllArgsConstructor
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
public class NeuralQueryBuilder extends AbstractQueryBuilder<NeuralQueryBuilder> implements ModelInferenceQueryBuilder {

public static final String NAME = "neural";
Expand Down Expand Up @@ -108,6 +108,126 @@ public static void initialize(MLCommonsClientAccessor mlClient) {
private Map<String, ?> methodParameters;
private RescoreContext rescoreContext;

/**
* A custom builder class to enforce valid Neural Query Builder instance by validating the required fields are initialized
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
*/
public static class Builder {
private String fieldName;
private String queryText;
private String queryImage;
private String modelId;
private Integer k = null;
private Float maxDistance = null;
private Float minScore = null;
private Boolean expandNested;
private Supplier<float[]> vectorSupplier;
private QueryBuilder filter;
private Map<String, ?> methodParameters;
private RescoreContext rescoreContext;
private String queryName;
private float boost = DEFAULT_BOOST;

public Builder() {}

public Builder fieldName(String fieldName) {
this.fieldName = fieldName;
return this;
}

public Builder queryText(String queryText) {
this.queryText = queryText;
return this;
}

public Builder queryImage(String queryImage) {
this.queryImage = queryImage;
return this;
}

public Builder modelId(String modelId) {
this.modelId = modelId;
return this;
}

public Builder k(Integer k) {
this.k = k;
return this;
}

public Builder maxDistance(Float maxDistance) {
this.maxDistance = maxDistance;
return this;
}

public Builder minScore(Float minScore) {
this.minScore = minScore;
return this;
}

public Builder expandNested(Boolean expandNested) {
this.expandNested = expandNested;
return this;
}

public Builder vectorSupplier(Supplier<float[]> vectorSupplier) {
this.vectorSupplier = vectorSupplier;
return this;
}

public Builder filter(QueryBuilder filter) {
this.filter = filter;
return this;
}

public Builder methodParameters(Map<String, ?> methodParameters) {
this.methodParameters = methodParameters;
return this;
}

public Builder queryName(String queryName) {
this.queryName = queryName;
return this;
}

public Builder boost(float boost) {
this.boost = boost;
return this;
}

public Builder rescoreContext(RescoreContext rescoreContext) {
this.rescoreContext = rescoreContext;
return this;
}

public NeuralQueryBuilder build() {
validateQueryParameters(fieldName, queryText, queryImage);
int k = this.k == null ? DEFAULT_K : this.k;
boolean queryTypeIsProvided = validateKNNQueryType(k, maxDistance, minScore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we're adding this logic here, it wasn't part of the original constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this change, we wanted to add validation before instantiation so ensure that only valid NeuralSearchBuilder can be instantiated. See previous comment here: #1047 (comment)

Let me know your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm rely on @junqiu-lei expertise here, I'm good if he's good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need int k = this.k == null ? DEFAULT_K : this.k; at line 204.

If the default k is provided before validateKNNQueryType, when user use radial search parameter(only provide minScore or maxDistance), then validateKNNQueryType will return false and set to use default k at line 207.

Copy link
Member

@junqiu-lei junqiu-lei Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is failed in radial integ test due to this change:

NeuralQueryIT > testQueryWithBoostAndImageQueryAndRadialQuery FAILED
    java.lang.IllegalArgumentException: Only one of k, max_distance, or min_score can be provided
        at __randomizedtesting.SeedInfo.seed([D0122DFF0BE48EA8:885F5A920D6E8D5B]:0)
        at org.opensearch.neuralsearch.query.NeuralQueryBuilder.validateKNNQueryType(NeuralQueryBuilder.java:540)
        at org.opensearch.neuralsearch.query.NeuralQueryBuilder$Builder.build(NeuralQueryBuilder.java:205)
        at org.opensearch.neuralsearch.query.NeuralQueryIT.testQueryWithBoostAndImageQueryAndRadialQuery(NeuralQueryIT.java:151)

if (queryTypeIsProvided == false) {
k = DEFAULT_K;
}
return new NeuralQueryBuilder(
fieldName,
queryText,
queryImage,
modelId,
k,
maxDistance,
minScore,
expandNested,
vectorSupplier,
filter,
methodParameters,
rescoreContext
).boost(boost).queryName(queryName);
}

}

public static NeuralQueryBuilder.Builder builder() {
return new NeuralQueryBuilder.Builder();
}

/**
* Constructor from stream input
*
Expand Down Expand Up @@ -246,15 +366,16 @@ public static NeuralQueryBuilder fromXContent(XContentParser parser) throws IOEx
+ "]"
);
}
if (StringUtils.isBlank(neuralQueryBuilder.queryText()) && StringUtils.isBlank(neuralQueryBuilder.queryImage())) {
throw new IllegalArgumentException("Either query text or image text must be provided for neural query");
}
requireValue(neuralQueryBuilder.fieldName(), "Field name must be provided for neural query");
validateQueryParameters(neuralQueryBuilder.fieldName(), neuralQueryBuilder.queryText(), neuralQueryBuilder.queryImage());
if (!isClusterOnOrAfterMinReqVersionForDefaultModelIdSupport()) {
requireValue(neuralQueryBuilder.modelId(), "Model ID must be provided for neural query");
}

boolean queryTypeIsProvided = validateKNNQueryType(neuralQueryBuilder);
boolean queryTypeIsProvided = validateKNNQueryType(
neuralQueryBuilder.k(),
neuralQueryBuilder.maxDistance(),
neuralQueryBuilder.minScore()
);
if (queryTypeIsProvided == false) {
neuralQueryBuilder.k(DEFAULT_K);
}
Expand Down Expand Up @@ -397,15 +518,22 @@ public String getWriteableName() {
return NAME;
}

private static boolean validateKNNQueryType(NeuralQueryBuilder neuralQueryBuilder) {
private static void validateQueryParameters(String fieldName, String queryText, String queryImage) {
if (StringUtils.isBlank(queryText) && StringUtils.isBlank(queryImage)) {
throw new IllegalArgumentException("Either query text or image text must be provided for neural query");
}
requireValue(fieldName, "Field name must be provided for neural query");
}

private static boolean validateKNNQueryType(Integer k, Float maxDistance, Float minScore) {
int queryCount = 0;
if (neuralQueryBuilder.k() != null) {
if (k != null) {
queryCount++;
}
if (neuralQueryBuilder.maxDistance() != null) {
if (maxDistance != null) {
queryCount++;
}
if (neuralQueryBuilder.minScore() != null) {
if (minScore != null) {
queryCount++;
}
if (queryCount > 1) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,11 @@ public void testNeuralQueryEnricherProcessor_whenNoModelIdPassed_thenSuccess() {
createSearchRequestProcessor(modelId, search_pipeline);
createPipelineProcessor(modelId, ingest_pipeline, ProcessorType.TEXT_EMBEDDING);
updateIndexSettings(index, Settings.builder().put("index.search.default_pipeline", search_pipeline));
NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder();
neuralQueryBuilder.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1);
neuralQueryBuilder.queryText("Hello World");
neuralQueryBuilder.k(1);
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder()
.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1)
.queryText("Hello World")
.k(1)
.build();
Map<String, Object> response = search(index, neuralQueryBuilder, 2);
assertFalse(response.isEmpty());
} finally {
Expand Down Expand Up @@ -112,10 +113,11 @@ public void testNeuralQueryEnricherProcessor_whenHybridQueryBuilderAndNoModelIdP
createSearchRequestProcessor(modelId, search_pipeline);
createPipelineProcessor(modelId, ingest_pipeline, ProcessorType.TEXT_EMBEDDING);
updateIndexSettings(index, Settings.builder().put("index.search.default_pipeline", search_pipeline));
NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder();
neuralQueryBuilder.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1);
neuralQueryBuilder.queryText("Hello World");
neuralQueryBuilder.k(1);
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder()
.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1)
.queryText("Hello World")
.k(1)
.build();
HybridQueryBuilder hybridQueryBuilder = new HybridQueryBuilder();
hybridQueryBuilder.add(neuralQueryBuilder);
Map<String, Object> response = search(index, hybridQueryBuilder, 2);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ public void testFactory_whenModelIdIsNotString_thenFail() {

public void testProcessRequest_whenVisitingQueryBuilder_thenSuccess() throws Exception {
NeuralQueryEnricherProcessor.Factory factory = new NeuralQueryEnricherProcessor.Factory();
NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder();
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder().fieldName("field_name").queryText("query_text").build();
SearchRequest searchRequest = new SearchRequest();
searchRequest.source(new SearchSourceBuilder().query(neuralQueryBuilder));
NeuralQueryEnricherProcessor processor = createTestProcessor(factory);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,20 +87,13 @@ public void testResultProcessor_whenOneShardAndQueryMatches_thenSuccessful() {
modelId = prepareModel();
createSearchPipelineWithResultsPostProcessor(SEARCH_PIPELINE);

NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder(
TEST_KNN_VECTOR_FIELD_NAME_1,
TEST_DOC_TEXT1,
"",
modelId,
5,
null,
null,
null,
null,
null,
null,
null
);
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder()
.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1)
.queryText(TEST_DOC_TEXT1)
.modelId(modelId)
.k(5)
.build();

TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery(TEST_TEXT_FIELD_NAME_1, TEST_QUERY_TEXT3);

HybridQueryBuilder hybridQueryBuilder = new HybridQueryBuilder();
Expand Down Expand Up @@ -140,20 +133,13 @@ public void testResultProcessor_whenDefaultProcessorConfigAndQueryMatches_thenSu
modelId = prepareModel();
createSearchPipelineWithDefaultResultsPostProcessor(SEARCH_PIPELINE);

NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder(
TEST_KNN_VECTOR_FIELD_NAME_1,
TEST_DOC_TEXT1,
"",
modelId,
5,
null,
null,
null,
null,
null,
null,
null
);
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder()
.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1)
.queryText(TEST_DOC_TEXT1)
.modelId(modelId)
.k(5)
.build();

TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery(TEST_TEXT_FIELD_NAME_1, TEST_QUERY_TEXT3);

HybridQueryBuilder hybridQueryBuilder = new HybridQueryBuilder();
Expand Down Expand Up @@ -182,20 +168,13 @@ public void testQueryMatches_whenMultipleShards_thenSuccessful() {
createSearchPipelineWithResultsPostProcessor(SEARCH_PIPELINE);
int totalExpectedDocQty = 6;

NeuralQueryBuilder neuralQueryBuilder = new NeuralQueryBuilder(
TEST_KNN_VECTOR_FIELD_NAME_1,
TEST_DOC_TEXT1,
"",
modelId,
6,
null,
null,
null,
null,
null,
null,
null
);
NeuralQueryBuilder neuralQueryBuilder = NeuralQueryBuilder.builder()
.fieldName(TEST_KNN_VECTOR_FIELD_NAME_1)
.queryText(TEST_DOC_TEXT1)
.modelId(modelId)
.k(6)
.build();

TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery(TEST_TEXT_FIELD_NAME_1, TEST_QUERY_TEXT3);

HybridQueryBuilder hybridQueryBuilder = new HybridQueryBuilder();
Expand Down
Loading
Loading