Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate data in search #661

Open
Isak-Kallini opened this issue Dec 20, 2024 · 5 comments
Open

Duplicate data in search #661

Isak-Kallini opened this issue Dec 20, 2024 · 5 comments
Labels
bug Something isn't working as intended

Comments

@Isak-Kallini
Copy link
Member

Description

image
I think this is because the daily sync is run in each instance at the same time (eg. 4 instances in the screenshot)

Steps to reproduce

No response

Further information

No response

@Isak-Kallini Isak-Kallini added the bug Something isn't working as intended label Dec 20, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Web Dec 20, 2024
@alfredgrip
Copy link
Contributor

Is there any way to ensure only one instance runs this? @danieladugyan

@alfredgrip
Copy link
Contributor

I think this could have been solved if we didn't generate ids when putting data in meilisearch, but instead use our primary keys from Prisma. But this haven't worked for me, since I think some of the ids we use for some tables aren't compatible with Meilisearch. Meilisearch likes uuids, but for Positions we use like dsek.infu.dwww.mdlm, and I suspect that Meilisearch doens't like this form of ids.
If we didn't generate ids, Meilisearch would see that we are sending duplicate data and not create four different entries.

@alfredgrip
Copy link
Contributor

alfredgrip commented Dec 20, 2024

I think this could have been solved if we didn't generate ids when putting data in meilisearch, but instead use our primary keys from Prisma. But this haven't worked for me, since I think some of the ids we use for some tables aren't compatible with Meilisearch. Meilisearch likes uuids, but for Positions we use like dsek.infu.dwww.mdlm, and I suspect that Meilisearch doens't like this form of ids. If we didn't generate ids, Meilisearch would see that we are sending duplicate data and not create four different entries.

Just confirmed this:

The document id must be an integer or a string. If the id is a string, it can only contain alphanumeric characters (a-z, A-Z, 0-9), hyphens (-), and underscores (_).

https://www.meilisearch.com/docs/learn/getting_started/primary_key#formatting-the-document-id
So it cannot contain dots . Should be an easy fix, but it's a shame that we have to do this conversion.

@alfredgrip
Copy link
Contributor

Added a commit in #662 which should solve this temporarily, but ideally it should be run on only one instance.

@alfredgrip
Copy link
Contributor

Maybe something like this could solve this issue?
https://greenydev.com/blog/pm2-cron-job-multiple-instances/#solution-1-setting-app-name-in-pm2-instances
It should be relevant for all cron jobs we do, not only syncing with Meilisearch.

@danieladugyan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as intended
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants