-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics endpoint timeout #15
Comments
Could you give a bit more information about the version you are running? |
Hi Abhishek, I'm using the following versions: Also, canary_dag is created and running properly. |
I have not been able to reproduce this. Are there any logs or stacktraces when you try to access the metrics endpoint? |
Hi Abhishek, After some more digging, I was able to find the issue. In turns out the timeout is due to the webserver waiting too long for a response from the database. The code fails in this part:
Basically due to the lack of filtering on the airflow task_instance table, it was pulling too much data and could not handle it (we have this airflow instance for 4+ years running, 100s of DAGs). I've manually changed the code on my side to pull only the latest 14 days of data: .filter(text("execution_date > NOW() - interval '14 days'"),) It's not a good solution at all, but maybe it would make sense to include a parameter that makes it flexible. For now I'll just deploy the changed version on my side since it fixed our issues |
Running into the same issue as mentioned above. Would be great if this can be resolved soon. Thanks! |
the same shit: [2020-11-12 16:09:58,644] {{security.py:328}} INFO - Cleaning faulty perms |
To help the ones still suffering from that - we deployed a much more stable solution using https://github.com/wrouesnel/postgres_exporter in our airflow instance. As we run it using Postgres anyway, we could get all metrics via SQL in a config.yml style. Scraping takes max 30s for everything, so it has been much more stable in our case. |
Any updates here? The exporter is basically worthless once the SQL DB reaches a certain size. Queries take too long and the container uses too much memory. |
I've deployed the new plugin, to replace airflow-exporter in our Airflow server, but for some reason I can't make it work. I've checked dependencies (airflow, prometheus_client) and everything is satisfied.
The only thing I'm able to see is that the gunicorn webserver processes timeout at some point:
Also, trying to curl it from a client times out, with no additional information. Is this a know bug and should be a workaround/fix for it? Thanks!
The text was updated successfully, but these errors were encountered: