Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindexing is getting stalled #30513

Open
dainiusjocas opened this issue Mar 8, 2024 · 1 comment
Open

Reindexing is getting stalled #30513

dainiusjocas opened this issue Mar 8, 2024 · 1 comment
Assignees
Milestone

Comments

@dainiusjocas
Copy link
Contributor

dainiusjocas commented Mar 8, 2024

Describe the bug
When a reindexing process is triggered and one or more of the synthetic fields in the indexing scripts invokes embed the progress seem to be stalled.

To Reproduce
Steps to reproduce the behavior:

  1. On an existing index with a field like:
field chunks type array<string> {}
  1. Add a synthetic field:
field colbert type tensor<int8>(context{}, token{}, v[16]) {
   indexing: input chunks | embed colbert context | attribute
}
  1. Trigger reindexing.
  2. After some initial progress (see screenshot below) the reindexing progress has stopped.
  3. Sometime the reindexing fails with status
{
  "enabled": true,
  "clusters": {
    "realm": {
      "pending": {},
      "ready": {
        "realm": {
          "readyMillis": 1709661753702,
          "speed": 1.0,
          "cause": "reindexing for an unknown reason",
          "startedMillis": 1709664660006,
          "endedMillis": 1709701153065,
          "message": "PROCESSING_FAILURE: ReturnCode(PROCESSING_FAILURE, [from content node 1] Time is up.)",
          "progress": 0.0,
          "state": "failed"
        }
      }
    }
  }
}

Expected behavior
I understand that inference on CPU takes time and embedding arrays of strings is not the best of ideas.
It would be great to have mo control over reindexing:

  • set a timeout for the document.
  • Reindexing is visiting so it would be great to select only a subset of documents to be processed at the cost of undefined ranking.

Also, more visibility into progress would be nice. Maybe a count of documents reindexed so far.
Furthermore, if somehow recalculating embeddings on synthetic fields could be skipped by checking hashes or something that also would be great.

Screenshots
Added dashboard.
Screenshot 2024-03-08 at 09 38 35

Environment (please complete the following information):

  • Google Cloud, GKE, deployed with custom Helm charts.

Vespa version
8.307.19

Additional context
Slack thread.
An interesting discovery: when persearch was reduced from being equal to the amount of CPU cores available to 1, the reindexing started progressing.

@kkraune kkraune added this to the soon milestone Mar 13, 2024
@baldersheim baldersheim removed their assignment Jul 1, 2024
@baldersheim baldersheim removed this from the soon milestone Jul 1, 2024
@kkraune kkraune added this to the soon milestone Jul 31, 2024
@kkraune
Copy link
Member

kkraune commented Jul 31, 2024

@jonmv can you look at the timeout value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants