-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented ElasticSearchStore - All Actions #1
base: main
Are you sure you want to change the base?
Conversation
I suggest you add those notes into, say, |
I am seeing that most of the files are like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review submitted
src/ElasticSearchStore.ts
Outdated
}); | ||
} | ||
|
||
if (msg.directive$ && msg.directive$.vector$) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to separate msg.q
from msg. It is not the same. directive$
is supposed to come from the msg.q
.
You can do something like this:
const q = msg.q
// use q - no need to reference it every time as msg.q
:)
The unit tests happened to pass but you are never actually creating this query since msg.directive$
will always be undefined
. Please, write another set of tests validating vector$.k
works with returning the correct number of the approximate nearest neighbors. We must use the latest: 8.14.0
of the npm package since the nearest neighbors feature isn't entirely supported by 7.x.x
.
Helpful: https://www.elastic.co/blog/introducing-approximate-nearest-neighbor-search-in-elasticsearch-8-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it was a mistake, did not notice because the test created for it was missleading.
Fixed.
src/ElasticSearchStore.ts
Outdated
|
||
save: async function (msg: any, reply: any) { | ||
const ent = msg.ent; | ||
const index = resolveIndex(ent, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use semi-colons :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, done!
test/create_index.js
Outdated
@@ -0,0 +1,63 @@ | |||
const path = require('path'); | |||
require('dotenv').config({ path: path.resolve(__dirname, '../.env.local') }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, don't use semi-colons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ElasticSearchStore.ts
Outdated
query.bool.must.push({ | ||
knn: { | ||
field: vectorFieldName, | ||
query_vector: msg.vector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comes from the query: q.vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, fixed.
src/ElasticSearchStore.ts
Outdated
knn: { | ||
field: vectorFieldName, | ||
query_vector: msg.vector, | ||
k: msg.directive$.vector$.k, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing this logic here:
null == vector$.k ? 11 : vector$.k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for it anymore.
src/ElasticSearchStore.ts
Outdated
|
||
if (msg.q) { | ||
Object.keys(msg.q).forEach(key => { | ||
if (key !== 'directive$' && key !== 'vector') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two ways to approach this.
Prefer to use (!key.match(/\$/))
instead of (key !== 'directive$')
in this case since there can be other query properties ending in $
such as fields$
, limit$
, etc.
You can, of course do this.
let q = msg.q
let cq = seneca.util.clean(q) // removes all properties ending in '$'
// use cq, q
But I will let you decide :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please use https://en.wikipedia.org/wiki/Yoda_conditions - that stands for Yoda Style in programming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using:
let cq = seneca.util.clean(q)
Did not know about this one, thanks. Using it in a new util function for readability:
function buildQuery(cleanedQuery: any) {
const boolQuery: any = { must: [], filter: [] }
Object.keys(cleanedQuery).forEach((key) => {
if ('vector' !== key) {
boolQuery.filter.push({ term: { [key]: cleanedQuery[key] } })
}
})
return { bool: boolQuery }
}
package.json
Outdated
@@ -64,6 +65,6 @@ | |||
], | |||
"dependencies": { | |||
"@aws-sdk/credential-provider-node": "^3.525.0", | |||
"@opensearch-project/opensearch": "^2.5.0" | |||
"@elastic/elasticsearch": "^7.17.13" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must use the latest: https://www.npmjs.com/package/@elastic/elasticsearch: 8.14.0
What is @aws-sdk/credential-provider-node
needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using "@elastic/elasticsearch": "^8.14.0"
Removed "@aws-sdk/credential-provider-node": "^3.525.0",
Thanks! Indeed we dont need aws and 8.14.0 is a must for kNN search.
test/create_index.js
Outdated
"type": "dense_vector", | ||
"dims": 8, | ||
"index": true, // Enable k-NN indexing | ||
"similarity": "l2_norm" // Specify similarity metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use: "similarity": "cosine"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed
test/create_index.js
Outdated
"type": "dense_vector", | ||
"dims": 8, | ||
"index": true, // Enable k-NN indexing | ||
"similarity": "cosine" // Specify similarity metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add this in case we need to be more specific about the index_options for podmind
"index_options": {
"type": "hnsw",
"m": 16,
"ef_construction": 512
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/ElasticSearchStore.ts
Outdated
const { hits } = knnResponse | ||
return hits.hits | ||
.filter((hit: any) => 0.5 <= hit._score) // Adjust the threshold based on your similarity measure | ||
.map((hit: any) => ({ id: hit._id, ...hit._source })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the Array.prototype.filter
and in the Array.prototype.map
, you can both entize and set the custom$.score.
{ ..., custom$: { score: hit._score } }
Also, make sure you unit test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please check about these changes to see if its ok.
test/ElasticSearch.test.ts
Outdated
test: 'knn-search' | ||
}); | ||
|
||
expect(list.length).toEqual(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it should be 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
To run tests locally, make sure you have a '.env.local' file in the format:
Also, you need Docker installed on your machine. Pull the Elastic Docker image from the hub and run the container. You can do this by running the script:
After that, create the index in your local image:
And to run the tests, use:
npm run test