-
-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama times out at 5m #628
Comments
Expanding on my investigation: the hoarder INFERENCE_JOB_TIMEOUT_SEC variable sets the keep_alive value in the request, which explicitly says:
So in fact this does the same as OLLAMA_KEEP_ALIVE, but at request level. Looking at the ollama-js library, it seems to be using streaming, so I am not sure where does the timeout come from :(
|
ok, that's weird. If your job timeout is 20mins. You shouldn't see inference jobs starting at 5mins intervals. Also you shouldn't have seen at least some errors between the runs. Let's start with the obvious question, did you run 'docker compose up' after updating the env file? |
Yeah, ollama is up and reacheable. I can see the behavior from my comment above with some of the bookmarks: hoarder tries to run inference on the same bookmark four times, I always see 5min POST in the ollama log. On the last one the exception comes.
I also tried to increase the hoarder timeout to a large value (1h) just to confirm that something funny like a 20m timeout being an aggregation of 4*5min. |
This is getting weirder every time I look at it. I have been going through the logs and I also see cases like this:
This would suggest my previous idea that there is some sort of aggregated timeout. |
Is there any chance this could have something to do with the retries? hoarder/packages/shared/queues.ts Line 44 in 8f44c81
I still however can't find out where are the 5 minutes coming from. |
i am not sure what you mean with "this". |
i am thinking it might be this: The comment suggests that it has nothing to do with anything in ollama or hoarder, but with the default timeout for fetch requests on nodejs, which seems to be 5 minutes. |
Yeah, I agree. I'm also as surprised as you are that I ended up implementing it that way :D I can fix that.
That indeed looks very likely to be the case. |
Sorry, the context was just in my brain and not in the comment. You described pretty much what I meant, that example shows a successful inference and yet there is the 5m timeout. I can't explain why is the second request so short and returns correctly. Your deduction of the timeout in nodejs fetch being at 5 minutes is what I have been looking for. One thing does not add up here for me though. What kind of timeout is it? If it would be a connection timeout (like curl), it would only happen, if the server would not respond within 5 minutes. It looks more like a maximum request duration, but I would expect it to not apply in case of a streaming response. |
@petrm Maybe the time to first token is higher than 5 minutes? |
I think the bug is mixing different timeouts and also other issues with 5m timeout are being referenced. In one case it is a long loading time - this is not my case, the load is super fast in my case. The other one is the nodejs request duration. My ollama actually responds with 200 at the 5m mark and not with a 500 like in the referenced issue. |
I've the same problem and saw 500 at the 5 Minute mark, but also 200 sometimes, whereas the 200s have been way more often than the 500s. |
ok, multiple people are hitting this now. So it's clearly a bug. I'm convinced that it's probably the default fetch timeout thing. |
It is the fetch timeout. I "patched" it by doing the following -
sudo docker cp hoarder-web-1:/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/ollama/dist/shared/ollama.9c897541.cjs ./ollama.9c897541.cjs
const { Agent } = require('undici');
// No-timeout fetch function
const noTimeoutFetch = (input, init = {}) => {
return fetch(input, {
...init,
dispatcher: new Agent({ headersTimeout: 3600000 }), // 1-hour timeout
});
};
this.fetch = noTimeoutFetch;
services:
...
web:
...
volumes:
- /docker/appdata/hoarder/data:/data
- ./ollama.9c897541.cjs:/app/apps/workers/node_modules/.pnpm/[email protected]/node_modules/ollama/dist/shared/ollama.9c897541.cjs Basically just copied the solution someone linked above. |
Can confirm that @sbarbett changes mentioned does indeed resolve the issue. Nevertheless, as far as I can see, this is a change in a library and not in hoarder itself - otherwise I would have done that in a PR. |
I agree that we shouldn't be hand-modifying the node modules in the way I described above. The issue needs to be addressed upstream, i.e. someone should open a discussion with An interim solution would be to make a custom wrapper. By default, it will be 5 minutes unless someone adds an Call this const { Agent } = require('undici');
// Default timeout of 5 minutes in milliseconds
const defaultTimeout = 5 * 60 * 1000;
// Custom fetch function that uses a custom timeout
const noTimeoutFetch = (input, init = {}) => {
// Get timeout from environment variable or use default
const timeout = process.env.INFERENCE_FETCH_TIMEOUT_SEC
? parseInt(process.env.INFERENCE_FETCH_TIMEOUT_SEC, 10) * 1000
: defaultTimeout;
// Set up an Agent with the configured timeout
return fetch(input, {
...init,
dispatcher: new Agent({ headersTimeout: timeout }),
});
};
module.exports = { noTimeoutFetch }; Then wherever the Ollama client is initiated in the Hoarder code, this needs imported and passed to the constructor. const { noTimeoutFetch } = require('../utils/customFetch'); // Adjust path as necessary
// ...
const Ollama = require('ollama-js');
const ollamaClient = new Ollama({
host: appConfig.OLLAMA_BASE_URL,
fetch: noTimeoutFetch, // Use custom fetch here
}); In the YAML we'd just add this to the environment like so. services:
web:
environment:
- INFERENCE_FETCH_TIMEOUT_SEC=3600 # 1 hour fetch timeout Down the road, if they ever implement a global timeout, you'd just get rid of the custom wrapper and map the |
I wrote a custom wrapper to fix this bug and submitted a pull request. |
Describe the Bug
I am running AI tagging with Ollama and I have set the timeout to INFERENCE_JOB_TIMEOUT_SEC=1200. This is not respected, if the query runs longer than 5 minutes, it just fails without any error message and runs the inference job again.
I looked through the ollama bug tracker and there have been similar issues reported. They were all resolved with the message that it is the client closing the connection and it was suggested to switch to a streamed response.
I also checked the ollama source code for any timeouts and found out that OLLAMA_KEEP_ALIVE has a default value of 5m. Changing this to a larger value does not have any effect on this issue.
Using curl against ollama confirms that queries running longer than 5 minutes can be completed successfully:
Steps to Reproduce
Set INFERENCE_JOB_TIMEOUT_SEC to a value >5m and run an inference job longer than 5m with ollama.
Expected Behaviour
Inference completes succesfully.
Screenshots or Additional Context
No response
Device Details
No response
Exact Hoarder Version
v18
The text was updated successfully, but these errors were encountered: