Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented ElasticSearchStore - All Actions #1

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

Jose-Perigolo
Copy link

@Jose-Perigolo Jose-Perigolo commented Jun 20, 2024

To run tests locally, make sure you have a '.env.local' file in the format:

ELASTICSEARCH_NODE=http://localhost:9200
ELASTICSEARCH_INDEX=vector_index

Also, you need Docker installed on your machine. Pull the Elastic Docker image from the hub and run the container. You can do this by running the script:

./test/setup.sh        

After that, create the index in your local image:

npm run create-index         

And to run the tests, use:

npm run test         

@BeAllAround
Copy link

BeAllAround commented Jun 20, 2024

I suggest you add those notes into, say, /test/setup.sh. A dev will be able to easily just run it and read your notes.

@BeAllAround
Copy link

I am seeing that most of the files are like this: ElasticsearchStore. Please, replace them with camelCase ElasticSearchStore consistently.

@Jose-Perigolo Jose-Perigolo changed the title Initial Commit - Basic Functionalities Implemented ElasticSearchStore - All functionaties working Jun 21, 2024
@Jose-Perigolo Jose-Perigolo changed the title Implemented ElasticSearchStore - All functionaties working Implemented ElasticSearchStore - All Actions Jun 21, 2024
Copy link

@BeAllAround BeAllAround left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review submitted

});
}

if (msg.directive$ && msg.directive$.vector$) {
Copy link

@BeAllAround BeAllAround Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to separate msg.q from msg. It is not the same. directive$ is supposed to come from the msg.q.

You can do something like this:
const q = msg.q

// use q - no need to reference it every time as msg.q :)
The unit tests happened to pass but you are never actually creating this query since msg.directive$ will always be undefined. Please, write another set of tests validating vector$.k works with returning the correct number of the approximate nearest neighbors. We must use the latest: 8.14.0 of the npm package since the nearest neighbors feature isn't entirely supported by 7.x.x.

Helpful: https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it was a mistake, did not notice because the test created for it was missleading.
Fixed.


save: async function (msg: any, reply: any) {
const ent = msg.ent;
const index = resolveIndex(ent, options);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use semi-colons :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, done!

@@ -0,0 +1,63 @@
const path = require('path');
require('dotenv').config({ path: path.resolve(__dirname, '../.env.local') });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't use semi-colons

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

query.bool.must.push({
knn: {
field: vectorFieldName,
query_vector: msg.vector,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comes from the query: q.vector

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed.

knn: {
field: vectorFieldName,
query_vector: msg.vector,
k: msg.directive$.vector$.k,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing this logic here:

null == vector$.k ? 11 : vector$.k

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for it anymore.


if (msg.q) {
Object.keys(msg.q).forEach(key => {
if (key !== 'directive$' && key !== 'vector') {
Copy link

@BeAllAround BeAllAround Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways to approach this.
Prefer to use (!key.match(/\$/)) instead of (key !== 'directive$') in this case since there can be other query properties ending in $ such as fields$, limit$, etc.

You can, of course do this.

let q = msg.q

let cq = seneca.util.clean(q) // removes all properties ending in '$'

// use cq, q

But I will let you decide :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please use https://en.wikipedia.org/wiki/Yoda_conditions - that stands for Yoda Style in programming.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using:
let cq = seneca.util.clean(q)

Did not know about this one, thanks. Using it in a new util function for readability:

function buildQuery(cleanedQuery: any) {
  const boolQuery: any = { must: [], filter: [] }
  Object.keys(cleanedQuery).forEach((key) => {
    if ('vector' !== key) {
      boolQuery.filter.push({ term: { [key]: cleanedQuery[key] } })
    }
  })
  return { bool: boolQuery }
}

package.json Outdated
@@ -64,6 +65,6 @@
],
"dependencies": {
"@aws-sdk/credential-provider-node": "^3.525.0",
"@opensearch-project/opensearch": "^2.5.0"
"@elastic/elasticsearch": "^7.17.13"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must use the latest: https://www.npmjs.com/package/@elastic/elasticsearch: 8.14.0

What is @aws-sdk/credential-provider-node needed for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "@elastic/elasticsearch": "^8.14.0"
Removed "@aws-sdk/credential-provider-node": "^3.525.0",

Thanks! Indeed we dont need aws and 8.14.0 is a must for kNN search.

"type": "dense_vector",
"dims": 8,
"index": true, // Enable k-NN indexing
"similarity": "l2_norm" // Specify similarity metric

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use: "similarity": "cosine"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@BeAllAround BeAllAround left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

"type": "dense_vector",
"dims": 8,
"index": true, // Enable k-NN indexing
"similarity": "cosine" // Specify similarity metric

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add this in case we need to be more specific about the index_options for podmind

              "index_options": {
                "type": "hnsw",
                "m": 16,
                "ef_construction": 512
              }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const { hits } = knnResponse
return hits.hits
.filter((hit: any) => 0.5 <= hit._score) // Adjust the threshold based on your similarity measure
.map((hit: any) => ({ id: hit._id, ...hit._source }))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the Array.prototype.filter and in the Array.prototype.map, you can both entize and set the custom$.score.

{ ..., custom$: { score: hit._score } }

Also, make sure you unit test this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please check about these changes to see if its ok.

test: 'knn-search'
});

expect(list.length).toEqual(0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it should be 2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Jose-Perigolo Jose-Perigolo requested a review from BeAllAround July 1, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants