Reindexing is getting stalled #30513

dainiusjocas · 2024-03-08T07:59:39Z

Describe the bug
When a reindexing process is triggered and one or more of the synthetic fields in the indexing scripts invokes embed the progress seem to be stalled.

To Reproduce
Steps to reproduce the behavior:

On an existing index with a field like:

field chunks type array<string> {}

Add a synthetic field:

field colbert type tensor<int8>(context{}, token{}, v[16]) {
   indexing: input chunks | embed colbert context | attribute
}

Trigger reindexing.
After some initial progress (see screenshot below) the reindexing progress has stopped.
Sometime the reindexing fails with status

{
  "enabled": true,
  "clusters": {
    "realm": {
      "pending": {},
      "ready": {
        "realm": {
          "readyMillis": 1709661753702,
          "speed": 1.0,
          "cause": "reindexing for an unknown reason",
          "startedMillis": 1709664660006,
          "endedMillis": 1709701153065,
          "message": "PROCESSING_FAILURE: ReturnCode(PROCESSING_FAILURE, [from content node 1] Time is up.)",
          "progress": 0.0,
          "state": "failed"
        }
      }
    }
  }
}

Expected behavior
I understand that inference on CPU takes time and embedding arrays of strings is not the best of ideas.
It would be great to have mo control over reindexing:

set a timeout for the document.
Reindexing is visiting so it would be great to select only a subset of documents to be processed at the cost of undefined ranking.

Also, more visibility into progress would be nice. Maybe a count of documents reindexed so far.
Furthermore, if somehow recalculating embeddings on synthetic fields could be skipped by checking hashes or something that also would be great.

Screenshots
Added dashboard.

Environment (please complete the following information):

Google Cloud, GKE, deployed with custom Helm charts.

Vespa version
8.307.19

Additional context
Slack thread.
An interesting discovery: when persearch was reduced from being equal to the amount of CPU cores available to 1, the reindexing started progressing.

The text was updated successfully, but these errors were encountered:

kkraune · 2024-07-31T12:15:22Z

@jonmv can you look at the timeout value?

kkraune assigned baldersheim Mar 13, 2024

kkraune added this to the soon milestone Mar 13, 2024

baldersheim removed their assignment Jul 1, 2024

baldersheim removed this from the soon milestone Jul 1, 2024

kkraune assigned jonmv Jul 3, 2024

kkraune added this to the soon milestone Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reindexing is getting stalled #30513

Reindexing is getting stalled #30513

dainiusjocas commented Mar 8, 2024 •

edited

Loading

kkraune commented Jul 31, 2024

Reindexing is getting stalled #30513

Reindexing is getting stalled #30513

Comments

dainiusjocas commented Mar 8, 2024 • edited Loading

kkraune commented Jul 31, 2024

dainiusjocas commented Mar 8, 2024 •

edited

Loading