You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When a reindexing process is triggered and one or more of the synthetic fields in the indexing scripts invokes embed the progress seem to be stalled.
To Reproduce
Steps to reproduce the behavior:
On an existing index with a field like:
field chunks type array<string> {}
Add a synthetic field:
field colbert type tensor<int8>(context{}, token{}, v[16]) {
indexing: input chunks | embed colbert context | attribute
}
Trigger reindexing.
After some initial progress (see screenshot below) the reindexing progress has stopped.
Expected behavior
I understand that inference on CPU takes time and embedding arrays of strings is not the best of ideas.
It would be great to have mo control over reindexing:
set a timeout for the document.
Reindexing is visiting so it would be great to select only a subset of documents to be processed at the cost of undefined ranking.
Also, more visibility into progress would be nice. Maybe a count of documents reindexed so far.
Furthermore, if somehow recalculating embeddings on synthetic fields could be skipped by checking hashes or something that also would be great.
Screenshots
Added dashboard.
Environment (please complete the following information):
Google Cloud, GKE, deployed with custom Helm charts.
Vespa version
8.307.19
Additional context
Slack thread.
An interesting discovery: when persearch was reduced from being equal to the amount of CPU cores available to 1, the reindexing started progressing.
The text was updated successfully, but these errors were encountered:
Describe the bug
When a reindexing process is triggered and one or more of the synthetic fields in the indexing scripts invokes
embed
the progress seem to be stalled.To Reproduce
Steps to reproduce the behavior:
Expected behavior
I understand that inference on CPU takes time and embedding arrays of strings is not the best of ideas.
It would be great to have mo control over reindexing:
Also, more visibility into progress would be nice. Maybe a count of documents reindexed so far.
Furthermore, if somehow recalculating embeddings on synthetic fields could be skipped by checking hashes or something that also would be great.
Screenshots
![Screenshot 2024-03-08 at 09 38 35](https://private-user-images.githubusercontent.com/230044/311154837-7eee2401-bf30-414a-9856-6a3608803c20.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0OTI5MzgsIm5iZiI6MTczOTQ5MjYzOCwicGF0aCI6Ii8yMzAwNDQvMzExMTU0ODM3LTdlZWUyNDAxLWJmMzAtNDE0YS05ODU2LTZhMzYwODgwM2MyMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNFQwMDIzNThaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jZjA5ZjdjMjViMGEyZmQ5MWQ1NTcxMDI4MzU3YmJkMmNmNGM3ZjYyMjY1OWMxOWUxNjVmNzMzMzBhMzU4ODA2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.7OM-1losPFbCrvRDs5lRFrP1VWB_SRfgM607UzcaD_g)
Added dashboard.
Environment (please complete the following information):
Vespa version
8.307.19
Additional context
Slack thread.
An interesting discovery: when persearch was reduced from being equal to the amount of CPU cores available to 1, the reindexing started progressing.
The text was updated successfully, but these errors were encountered: