Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reindex also when embedding parameters change #5

Open
oatmealm opened this issue Dec 12, 2024 · 1 comment
Open

reindex also when embedding parameters change #5

oatmealm opened this issue Dec 12, 2024 · 1 comment

Comments

@oatmealm
Copy link

oatmealm commented Dec 12, 2024

I've deleted the db to reindex documents to see the effect of different settings for chunking and overlap (for local files, my literature is in the humanities mostly, clean markdown or text files). I think it might be useful to detect that and re-index also when the changes were made to the config file?

@jkitchin
Copy link
Owner

I will have to think about how to do that. The only thing that comes to mind is to store an md5 hash somewhere. That is too sensitive though, you don't want to trigger reindexing just because you change a space or email or something, and it isn't very future proof if choosing an embedding model gets more flexible.

It is also not necessary to delete the db. I think you would just iterate over the rows, get the text, and update the embedding. I added a function to do that.

jkitchin added a commit that referenced this issue Dec 12, 2024
when you change dimensions, you have to remake some columns. See issue #5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants