Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrates FAISS iterative builds with NativeEngines990KnnVectorsFormat #1950

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# CHANGELOG
All notable changes to this project are documented in this file.

Expand All @@ -6,7 +7,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
## [Unreleased 3.0](https://github.com/opensearch-project/k-NN/compare/2.x...HEAD)
### Features
### Enhancements
### Bug Fixes
### Bug Fixes
### Infrastructure
* Removed JDK 11 and 17 version from CI runs [#1921](https://github.com/opensearch-project/k-NN/pull/1921)
### Documentation
Expand All @@ -17,10 +18,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
### Features
* Integrate Lucene Vector field with native engines to use KNNVectorFormat during segment creation [#1945](https://github.com/opensearch-project/k-NN/pull/1945)
### Enhancements
* Adds iterative graph build capability into a faiss index to improve the memory footprint during indexing and Integrates KNNVectorsFormat for native engines[#1950](https://github.com/opensearch-project/k-NN/pull/1950)
### Bug Fixes
* Corrected search logic for scenario with non-existent fields in filter [#1874](https://github.com/opensearch-project/k-NN/pull/1874)
* Add script_fields context to KNNAllowlist [#1917] (https://github.com/opensearch-project/k-NN/pull/1917)
* Fix graph merge stats size calculation [#1844](https://github.com/opensearch-project/k-NN/pull/1844)
* Fix graph merge stats size calculation [#1844](https://github.com/opensearch-project/k-NN/pull/1844)
* Disallow a vector field to have an invalid character for a physical file name. [#1936](https://github.com/opensearch-project/k-NN/pull/1936)
### Infrastructure
### Documentation
Expand Down
17 changes: 15 additions & 2 deletions jni/include/commons.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,19 @@ namespace knn_jni {
* For subsequent calls you can pass the same memoryAddress. If the data cannot be stored in the memory location
* will throw Exception.
*
* append tells the method to keep appending to the existing vector. Passing the value as false will clear the vector
* without reallocating new memory. This helps with reducing memory frangmentation and overhead of allocating
* and deallocating when the memory address needs to be reused.
*
* CAUTION: The behavior is undefined if the memory address is deallocated and the method is called
*
* @param memoryAddress The address of the memory location where data will be stored.
* @param data 2D float array containing data to be stored in native memory.
* @param initialCapacity The initial capacity of the memory location.
* @param append whether to append or start from index 0 when called subsequently with the same address
* @return memory address of std::vector<float> where the data is stored.
*/
jlong storeVectorData(knn_jni::JNIUtilInterface *, JNIEnv *, jlong , jobjectArray, jlong);
jlong storeVectorData(knn_jni::JNIUtilInterface *, JNIEnv *, jlong , jobjectArray, jlong, jboolean);

/**
* This is utility function that can be used to store data in native memory. This function will allocate memory for
Expand All @@ -33,12 +40,18 @@ namespace knn_jni {
* For subsequent calls you can pass the same memoryAddress. If the data cannot be stored in the memory location
* will throw Exception.
*
* append tells the method to keep appending to the existing vector. Passing the value as false will clear the vector
* without reallocating new memory. This helps with reducing memory frangmentation and overhead of allocating
* and deallocating when the memory address needs to be reused.
*
* CAUTION: The behavior is undefined if the memory address is deallocated and the method is called
*
* @param memoryAddress The address of the memory location where data will be stored.
* @param data 2D byte array containing data to be stored in native memory.
* @param initialCapacity The initial capacity of the memory location.
* @return memory address of std::vector<uint8_t> where the data is stored.
*/
jlong storeByteVectorData(knn_jni::JNIUtilInterface *, JNIEnv *, jlong , jobjectArray, jlong);
jlong storeByteVectorData(knn_jni::JNIUtilInterface *, JNIEnv *, jlong , jobjectArray, jlong, jboolean);

/**
* Free up the memory allocated for the data stored in memory address. This function should be used with the memory
Expand Down
87 changes: 53 additions & 34 deletions jni/include/faiss_index_service.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,38 +31,41 @@ namespace faiss_wrapper {
class IndexService {
public:
IndexService(std::unique_ptr<FaissMethods> faissMethods);
//TODO Remove dependency on JNIUtilInterface and JNIEnv
//TODO Reduce the number of parameters

/**
* Create index
* Initialize index
*
* @param jniUtil jni util
* @param env jni environment
* @param metric space type for distance calculation
* @param indexDescription index description to be used by faiss index factory
* @param dim dimension of vectors
* @param numVectors number of vectors
* @param threadCount number of thread count to be used while adding data
* @param parameters parameters to be applied to faiss index
* @return memory address of the native index object
*/
virtual jlong initIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, faiss::MetricType metric, std::string indexDescription, int dim, int numVectors, int threadCount, std::unordered_map<std::string, jobject> parameters);
/**
* Add vectors to index
*
* @param dim dimension of vectors
* @param numIds number of vectors
* @param threadCount number of thread count to be used while adding data
* @param vectorsAddress memory address which is holding vector data
* @param ids a list of document ids for corresponding vectors
* @param idMapAddress memory address of the native index object
*/
virtual void insertToIndex(int dim, int numIds, int threadCount, int64_t vectorsAddress, std::vector<int64_t> &ids, jlong idMapAddress);
/**
* Write index to disk
*
* @param threadCount number of thread count to be used while adding data
* @param indexPath path to write index
* @param parameters parameters to be applied to faiss index
* @param idMap memory address of the native index object
*/
virtual void createIndex(
knn_jni::JNIUtilInterface * jniUtil,
JNIEnv * env,
faiss::MetricType metric,
std::string indexDescription,
int dim,
int numIds,
int threadCount,
int64_t vectorsAddress,
std::vector<int64_t> ids,
std::string indexPath,
std::unordered_map<std::string, jobject> parameters);
virtual void writeIndex(std::string indexPath, jlong idMapAddress);
virtual ~IndexService() = default;
protected:
virtual void allocIndex(faiss::Index * index, size_t dim, size_t numVectors);
std::unique_ptr<FaissMethods> faissMethods;
};

Expand All @@ -76,7 +79,21 @@ class BinaryIndexService : public IndexService {
//TODO Reduce the number of parameters
BinaryIndexService(std::unique_ptr<FaissMethods> faissMethods);
/**
* Create binary index
* Initialize index
*
* @param jniUtil jni util
* @param env jni environment
* @param metric space type for distance calculation
* @param indexDescription index description to be used by faiss index factory
* @param dim dimension of vectors
* @param numVectors number of vectors
* @param threadCount number of thread count to be used while adding data
* @param parameters parameters to be applied to faiss index
* @return memory address of the native index object
*/
virtual jlong initIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, faiss::MetricType metric, std::string indexDescription, int dim, int numVectors, int threadCount, std::unordered_map<std::string, jobject> parameters) override;
/**
* Add vectors to index
*
* @param jniUtil jni util
* @param env jni environment
Expand All @@ -86,28 +103,30 @@ class BinaryIndexService : public IndexService {
* @param numIds number of vectors
* @param threadCount number of thread count to be used while adding data
* @param vectorsAddress memory address which is holding vector data
* @param ids a list of document ids for corresponding vectors
* @param idMap a map of document id and vector id
* @param parameters parameters to be applied to faiss index
*/
virtual void insertToIndex(int dim, int numIds, int threadCount, int64_t vectorsAddress, std::vector<int64_t> &ids, jlong idMapAddress) override;
/**
* Write index to disk
*
* @param jniUtil jni util
* @param env jni environment
* @param metric space type for distance calculation
* @param indexDescription index description to be used by faiss index factory
* @param threadCount number of thread count to be used while adding data
* @param indexPath path to write index
* @param idMap a map of document id and vector id
* @param parameters parameters to be applied to faiss index
*/
virtual void createIndex(
knn_jni::JNIUtilInterface * jniUtil,
JNIEnv * env,
faiss::MetricType metric,
std::string indexDescription,
int dim,
int numIds,
int threadCount,
int64_t vectorsAddress,
std::vector<int64_t> ids,
std::string indexPath,
std::unordered_map<std::string, jobject> parameters
) override;
virtual void writeIndex(std::string indexPath, jlong idMapAddress) override;
virtual ~BinaryIndexService() = default;
protected:
virtual void allocIndex(faiss::Index * index, size_t dim, size_t numVectors) override;
};

}
}


#endif //OPENSEARCH_KNN_FAISS_INDEX_SERVICE_H
#endif //OPENSEARCH_KNN_FAISS_INDEX_SERVICE_H
9 changes: 5 additions & 4 deletions jni/include/faiss_wrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@

namespace knn_jni {
namespace faiss_wrapper {
// Create an index with ids and vectors. The configuration is defined by values in the Java map, parametersJ.
// The index is serialized to indexPathJ.
void CreateIndex(knn_jni::JNIUtilInterface * jniUtil, JNIEnv * env, jintArray idsJ, jlong vectorsAddressJ, jint dimJ,
jstring indexPathJ, jobject parametersJ, IndexService* indexService);
jlong InitIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jlong numDocs, jint dimJ, jobject parametersJ, IndexService *indexService);

void InsertToIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jintArray idsJ, jlong vectorsAddressJ, jint dimJ, jlong indexAddr, jint threadCount, IndexService *indexService);

void WriteIndex(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jstring indexPathJ, jlong indexAddr, IndexService *indexService);

// Create an index with ids and vectors. Instead of creating a new index, this function creates the index
// based off of the template index passed in. The index is serialized to indexPathJ.
Expand Down
49 changes: 40 additions & 9 deletions jni/include/org_opensearch_knn_jni_FaissService.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,54 @@
#ifdef __cplusplus
extern "C" {
#endif

/*
* Class: org_opensearch_knn_jni_FaissService
* Method: createIndex
* Method: initIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_createIndex
(JNIEnv *, jclass, jintArray, jlong, jint, jstring, jobject);

JNIEXPORT jlong JNICALL Java_org_opensearch_knn_jni_FaissService_initIndex(JNIEnv * env, jclass cls,
jlong numDocs, jint dimJ,
jobject parametersJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: createBinaryIndex
* Method: initBinaryIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_createBinaryIndex
(JNIEnv *, jclass, jintArray, jlong, jint, jstring, jobject);

JNIEXPORT jlong JNICALL Java_org_opensearch_knn_jni_FaissService_initBinaryIndex(JNIEnv * env, jclass cls,
jlong numDocs, jint dimJ,
jobject parametersJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: insertToIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_insertToIndex(JNIEnv * env, jclass cls, jintArray idsJ,
jlong vectorsAddressJ, jint dimJ,
jlong indexAddress, jint threadCount);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: insertToBinaryIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_insertToBinaryIndex(JNIEnv * env, jclass cls, jintArray idsJ,
jlong vectorsAddressJ, jint dimJ,
jlong indexAddress, jint threadCount);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: writeIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_writeIndex(JNIEnv * env, jclass cls,
jlong indexAddress,
jstring indexPathJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: writeBinaryIndex
* Signature: ([IJILjava/lang/String;Ljava/util/Map;)V
*/
JNIEXPORT void JNICALL Java_org_opensearch_knn_jni_FaissService_writeBinaryIndex(JNIEnv * env, jclass cls,
jlong indexAddress,
jstring indexPathJ);
/*
* Class: org_opensearch_knn_jni_FaissService
* Method: createIndexFromTemplate
Expand Down
6 changes: 3 additions & 3 deletions jni/include/org_opensearch_knn_jni_JNICommons.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,18 @@ extern "C" {
/*
* Class: org_opensearch_knn_jni_JNICommons
* Method: storeVectorData
* Signature: (J[[FJJ)
* Signature: (J[[FJJJ)
*/
JNIEXPORT jlong JNICALL Java_org_opensearch_knn_jni_JNICommons_storeVectorData
(JNIEnv *, jclass, jlong, jobjectArray, jlong);
(JNIEnv *, jclass, jlong, jobjectArray, jlong, jboolean);

/*
* Class: org_opensearch_knn_jni_JNICommons
* Method: storeVectorData
* Signature: (J[[FJJ)
*/
JNIEXPORT jlong JNICALL Java_org_opensearch_knn_jni_JNICommons_storeByteVectorData
(JNIEnv *, jclass, jlong, jobjectArray, jlong);
(JNIEnv *, jclass, jlong, jobjectArray, jlong, jboolean);

/*
* Class: org_opensearch_knn_jni_JNICommons
Expand Down
14 changes: 12 additions & 2 deletions jni/src/commons.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,29 +18,39 @@
#include "commons.h"

jlong knn_jni::commons::storeVectorData(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jlong memoryAddressJ,
jobjectArray dataJ, jlong initialCapacityJ) {
jobjectArray dataJ, jlong initialCapacityJ, jboolean appendJ) {
std::vector<float> *vect;
if ((long) memoryAddressJ == 0) {
vect = new std::vector<float>();
vect->reserve((long)initialCapacityJ);
} else {
vect = reinterpret_cast<std::vector<float>*>(memoryAddressJ);
}

if (appendJ == JNI_FALSE) {
vect->clear();
}

int dim = jniUtil->GetInnerDimensionOf2dJavaFloatArray(env, dataJ);
jniUtil->Convert2dJavaObjectArrayAndStoreToFloatVector(env, dataJ, dim, vect);

return (jlong) vect;
}

jlong knn_jni::commons::storeByteVectorData(knn_jni::JNIUtilInterface *jniUtil, JNIEnv *env, jlong memoryAddressJ,
jobjectArray dataJ, jlong initialCapacityJ) {
jobjectArray dataJ, jlong initialCapacityJ, jboolean appendJ) {
std::vector<uint8_t> *vect;
if ((long) memoryAddressJ == 0) {
vect = new std::vector<uint8_t>();
vect->reserve((long)initialCapacityJ);
} else {
vect = reinterpret_cast<std::vector<uint8_t>*>(memoryAddressJ);
}

if (appendJ == JNI_FALSE) {
vect->clear();
}

int dim = jniUtil->GetInnerDimensionOf2dJavaByteArray(env, dataJ);
jniUtil->Convert2dJavaObjectArrayAndStoreToByteVector(env, dataJ, dim, vect);

Expand Down
Loading
Loading