Skip to content
Arturo Filastò edited this page Jan 14, 2020 · 13 revisions

Handling a stuck pipeline due to slow rsync

It has been quite common recently for the pipeline to get stuck during the rsync process. This can be detected because the fetcher task is still running, while the hist_canning workflow is marked as failed.

See: https://github.com/ooni/sysadmin/issues/403

In order to fix it you should:

  1. Connect to datacollector and pkill rsync:
ssh datacollector.infra.ooni.io
datacollector:~$ sudo pkill rsync
  1. Once rsync has been killed check that the task is marked as up-for-retry from the airflow UI and wait for it to conclude. Tip: sometimes you need to kill rsync a couple of times before it gets a socket which has decent throughput

  2. Once it has concluded you can restart the hist_canning DAG. See screenshots below:

Screenshot 2020-01-14 at 17 04 45 Screenshot 2020-01-14 at 17 04 51

Screenshot 2020-01-14 at 17 04 56 Screenshot 2020-01-14 at 17 05 01

Clone this wiki locally