Skip to content
forked from guestrin-lab/ACORN

state-of-the-art search over vector embeddings and structured data (SIGMOD '24)

License

Notifications You must be signed in to change notification settings

csirianni/ACORN

 
 

Repository files navigation

ACORN

ACORN is an index for state-of-the-art search over vector embeddings and structured data (SIGMOD '24)

You can read more about our work in the paper: ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data

This implementation of the ACORN index is built on The FAISS Library in C++.

If you run into any issues, please open an issue and we'll respond promptly!

Installation

git clone https://github.com/stanford-futuredata/ACORN.git
cmake -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_PYTHON=OFF -DBUILD_TESTING=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -B build
make -C build -j faiss

Example Usage

  1. Initialize the index
d=128;
M=32; 
gamma=12;
M_beta=32;

// ACORN-gamma
faiss::IndexACORNFlat acorn_gamma(d, M, gamma, M_beta);

// ACORN-1
faiss::IndexACORNFlat acorn_1(d, M, 1, M*2);
  1. Construct the index
size_t nb, d2;
std::string filename = // your fvec file
float* xb = fvecs_read(filename.c_str(), &d2, &nb);
assert(d == d2 || !"dataset dimension is not as expected");
acorn_gamma.add(nb, xb);
  1. Search the index
// ... load nq queries, xb
// ... load attribute filters as array aq

std::vector<faiss::idx_t> nns2(k * nq);
std::vector<float> dis2(k * nq);

// create filter_ids_map to specify the passing entities for each predicate
std::vector<char> filter_ids_map(nq * N);
for (int xq = 0; xq < nq; xq++) {
    for (int xb = 0; xb < N; xb++) {
        filter_ids_map[xq * N + xb] = (bool) (metadata[xb] == aq[xq]);
    }
}

// perform efficient hybrid search
acorn_gamma.search(nq, xq, k, dis2.data(), nns2.data(), filter_ids_map.data());

About

state-of-the-art search over vector embeddings and structured data (SIGMOD '24)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 61.9%
  • Cuda 17.5%
  • Python 17.3%
  • C 1.9%
  • CMake 0.8%
  • Shell 0.6%