We need to upgrade our testing infrastructure soon-ish. #139

ali-ramadhan · 2019-03-19T15:47:04Z

Right now all our tests are lumped into one (unit, integration, and model verification tests) and we run the tests on the CPU and the GPU (most tests are shared).

This is not a high priority item right now, but it's already annoying that I have to wait several minutes for the GPU tests to run as I'm debugging. So just starting up a discussion around this topic.

I can see us hitting some limitations soon:

A comprehensive test suite will take time to run, long enough that we cannot keep running it during development and debugging.
Comprehensive model verification tests (or system tests?) will take even longer to run and are absolutely crucial (see Model verification tests. #81 Validation and performance benchmarks. #136), so this problem will get worse in the future.
GPU tests take a while to run because of long compile time (How to reduce compile time for GPU code? #66) and they run on top of all the CPU tests. In general, setting up GPU models take more time so it's not ideal that we're setting up tons of tiny models for testing. Testing GPU stuff may also involve some expensive scalar CUDA operations (see Disable slow fallback methods for CUDA. #82).

We will also need to run the test suite on the following architectures in the future:

single-core CPU (Travis CI and Appveyor are fine here)
single GPU (JuliaGPU's GitLab CI pipeline works great here)
multi-core single CPU (MPI) (paid CI plans will probably work here)
multiple distributed CPU nodes (MPI) (no idea where to run this)
multiple GPUs (MPI) (no idea where to run this)

Some ideas for things to do that will help:

Explicitly split the tests up into 2-3 suites
1.1. Unit tests: should run in a few minutes so we can run them during development and on every commit/PR/etc.
1.2. Integration tests: can take a while to run so we don't want to run these locally all the time but probably on every PR. Shouldn't take much more than 1 hour to run so we don't have to wait forever to merge PR's.
1.3. Model verification tests (also called end-to-end tests): will probably take a long time to run. Maybe run this once a day? Or manually if there's a PR that changes core functionality.
Run the tests in parallel. I think the main Julia repo does this. We might have to roll our own parallel solution (see this thread). This would also require expensive paid CI plans (but very much worth it in my opinion).
Thinking long-term, if we had a multi-CPU multi-GPU machine available we could probably roll our own CI solution for these distributed architectures. Ideally we'd want to see if we can get this through a service although it would probably cost $$$$$.

cc @christophernhill @jm-c @glwagner: I know we all care about testing.

cc @charleskawczynski @simonbyrne: Just wondering if this is a problem you guys are anticipating for CliMA.jl? We might be able to share some common solutions?

Resources:

Nvidia slides on building a GPU-focused CI solution

simonbyrne · 2019-03-19T23:22:25Z

I think that's a good summary of the issues facing the Clima repo as well.

@vchuravy might be able to expand more here, but from what I understand the JuliaGPU org is using their own box at UGent hooked up as a runner via GitLab CI. We are currently considering getting a similar setup here at Caltech: this might be also useable for multi-CPU/multi-GPU jobs as well.

simonbyrne · 2019-03-19T23:45:05Z

We are currently considering getting a similar setup here at Caltech: this might be also useable for multi-CPU/multi-GPU jobs as well.

I should add that if/when we get it set up, you of course would be welcome to make use of it!

ali-ramadhan · 2019-03-19T23:53:27Z

That would be awesome! We'll definitely keep a look out for what you guys end up using.

I wonder if it would be more cost-effective (and a better use of developer time) to just pay a CI service for this but for such an expensive set up it might not be worth it.

simonbyrne · 2019-03-20T02:42:14Z

Are there any GPU-capable CI services?

ali-ramadhan · 2019-03-20T12:29:51Z

Yeah I don't know... I was going to email around for quotes to see if they have any premium/custom set ups with GPUs and multiple CPUs.

If they're just spinning up VMs on the cloud then maybe it's just as simple as requesting a multi-core CPU with 2-4 GPUs (which I know is available on Google Cloud). But after factoring in support cost it might be pretty steep.

christophernhill · 2019-03-20T13:04:12Z

https://circleci.com/docs/2.0/gpu/ may be useful?

…

On Wed, Mar 20, 2019 at 08:30 Ali Ramadhan ***@***.***> wrote: Yeah I don't know... I was going to email around for quotes to see if they have any premium/custom set ups with GPUs and multiple CPUs. If they're just spinning up VMs on the cloud then maybe it's just as simple as requesting a multi-core CPU with 2-4 GPUs (which I know is available on Google Cloud). But after factoring in support cost it might be pretty steep. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#139 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXx4DUR7vVwZ_gPzKB5-UNlThhQbcZeks5vYim_gaJpZM4b8em2> .

ali-ramadhan · 2019-03-20T13:08:45Z

Thanks, forgot about Circle CI. I just emailed around for quotes from Travis CI, Drone, GitLab CI, and Circle CI. Sounds like enterprise tends to roll their own CI using Jenkins or TeamCity but we're just a small team that needs a custom solution so something simple like Travis CI might be fine.

Sounds like we can spin up our own cloud instances (e.g. on Google Cloud with those sweet credits) according to the specs we need then pay the CI service to basically set it up, maintain it, and provide support.

ali-ramadhan · 2019-04-19T19:47:44Z

I wonder how hard it would be to spin up a Google Cloud instance with a V100 GPU (or something cheaper, doesn't matter too much since we have enough credits) and set up a GitLab CI pipeline with it just like the one JuliaGPU has. We could share it with the JuliaGPU organization as well.

And if we need 2+ GPUs to really test MPI that would be easy to change (just spin up a new instance and load the "GitLab CI" image maybe).

It wouldn't run the tests on Windows or Mac, but we can pay a little bit more for dedicated Travis (Mac?) and Appveyor (Windows) resources if we want those to run fast as well.

cc @vchuravy is this easy-ish to set up? I think you were involved in setting up the current GitLab CI pipeline?
cc @jkozdon since your Slack post reminded me about this issue.

See: https://github.com/JuliaGPU/gitlab-ci

jkozdon · 2019-04-19T19:55:00Z

I like the idea of running on google. Setting up runners should be easy I think

cc @lcw

lcw · 2019-04-19T20:21:29Z

Yea, I like the points @vchuravy brought up in the weekly atmosphere meeting. He suggested that using google cloud would allow us to test the codes at various scales, from one gpu to hundreds.

ali-ramadhan · 2019-12-18T00:26:45Z

This issue is cropping up now that we regularly timeout on Travis (max runtime is 50 minutes) and we almost always time out on GitLab GPU CI (max runtime is 60 minutes, @maleadt might be able to increase that but it's a shared resource and we probably shouldn't be hogging it up). Surprisingly Appveyor is always fast now. I think free CI servers are just generally underpowered.

We definitely want to keep our tests and make them even more comprehensive so here are some ideas we can discuss (probably in January):

See if we can move Travis CI pipelines onto Azure DevOps. They seem to give out more runtime (up to 360 minutes I think) although they might always reduce that in the future if they get more users. CliMA and @simonbyrne seem to be having a good experience with Azure.
Split tests into a fast smaller test set (regression only?) and the full comprehensive test set. But we still need a place to run the comprehensive test set (maybe Azure runs the comprehensive tests?). We'll probably have to do this at some point.
Split up the tests into jobs that run in <50 minutes each. You can have unlimited jobs on Travis. But this feels like a lot of work to set up and the tests would still take long as you can't have that many parallel builds.

We'll have to test Oceananigans + MPI pretty soon but we can worry about that later. Slurm CI or setting something up with our 4xTitan V server might be a good option here.

johncmarshall54 · 2019-12-18T03:16:28Z

Why do model tests have to be long runs? Surely a few timesteps is enough to see if anything is broken.

…

On Tue, Dec 17, 2019, 7:26 PM Ali Ramadhan ***@***.***> wrote: This issue is cropping up now that we regularly timeout on Travis (max runtime is 50 minutes) and we almost always time out on GitLab GPU CI (max runtime is 60 minutes, @maleadt <https://github.com/maleadt> might be able to increase that but it's a shared resource and we probably shouldn't be hogging it up). Surprisingly Appveyor is always fast now. I think free CI servers are just generally underpowered. Unfortunately it seems like paying for CI will never happen but we definitely want to keep our tests and make them even more comprehensive so here are some ideas we can discuss (probably in January): 1. See if we can move Travis CI pipelines onto Azure DevOps. They seem to give out more runtime (up to 360 minutes I think) although they might always reduce that in the future if they get more users. CliMA and @simonbyrne <https://github.com/simonbyrne> seem to be having a good experience with Azure. 2. Split tests into a fast smaller test set (regression only?) and the full comprehensive test set. But we still need a place to run the comprehensive test set (maybe Azure runs the comprehensive tests?). We'll probably have to do this at some point. 3. Split up the tests into jobs that run in <50 minutes each. You can have unlimited jobs on Travis. But this feels like a lot of work to set up and the tests would still take long as you can't have that many parallel builds. We'll have to test Oceananigans + MPI pretty soon but we can worry about that later. Slurm CI or setting something up with our 4xTitan V server might be a good option here. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#139>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKXUEQSZAUZJIBELHWDZFHLQZFU4PANCNFSM4G7R5G3A> .

ali-ramadhan · 2019-12-18T13:13:23Z

Why do model tests have to be long runs? Surely a few timesteps is enough
to see if anything is broken.

That is true but we already do this. Most tests run very small models for a single time step. Some run for a bit longer to test e.g. incompressibility or tracer conservation over time but even then it's like 10 time steps, and those tests don't take very long.

The problem is just sheer number of tests as we try to test each feature on CPU and GPU, Float32 and Float64, with every closure, etc. We been adding tests over time so we currently have ~2000 tests in total (counting GPU tests too). Julia's compiler takes a while to compile everything so that doesn't help. The tests run in 15-20 minutes on my laptop but the free CI servers aren't as powerful so I'm not surprised the tests take over 50-60 minutes.

We're only going to be adding more tests in the future.

ali-ramadhan · 2020-08-27T17:49:27Z

Gonna close this issue with PR #872 as there's not much to do and no actionable items.

Unless we get paid-tier CI we'll probably stick with Travis CI (Linux+Mac CPU + doc builds), GitLab CI (Linux CPU+GPU), Appveyor CI (Windows CPU), and Docker CI. With MPI (#590) we'll probably have to look into https://github.com/CliMA/slurmci.

it's already annoying that I have to wait several minutes for the GPU tests to run

Haha those were good times.

ali-ramadhan added testing 🧪 Tests get priority in case of emergency evacuation help wanted 🦮 plz halp (guide dog provided) labels Mar 19, 2019

ali-ramadhan mentioned this issue May 23, 2019

Poisson solver test: recovery of an analytic solution. #227

Merged

ali-ramadhan mentioned this issue Aug 9, 2019

Verification experiments should become sophisticated tests #347

Closed

ali-ramadhan mentioned this issue Aug 27, 2020

Split tests into four groups for CI job matrices #872

Merged

ali-ramadhan closed this as completed in #872 Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need to upgrade our testing infrastructure soon-ish. #139

We need to upgrade our testing infrastructure soon-ish. #139

ali-ramadhan commented Mar 19, 2019 •

edited

Loading

simonbyrne commented Mar 19, 2019

simonbyrne commented Mar 19, 2019

ali-ramadhan commented Mar 19, 2019

simonbyrne commented Mar 20, 2019 via email

ali-ramadhan commented Mar 20, 2019

christophernhill commented Mar 20, 2019 via email

ali-ramadhan commented Mar 20, 2019 •

edited

Loading

ali-ramadhan commented Apr 19, 2019 •

edited

Loading

jkozdon commented Apr 19, 2019

lcw commented Apr 19, 2019

ali-ramadhan commented Dec 18, 2019 •

edited

Loading

johncmarshall54 commented Dec 18, 2019 via email

ali-ramadhan commented Dec 18, 2019

ali-ramadhan commented Aug 27, 2020

We need to upgrade our testing infrastructure soon-ish. #139

We need to upgrade our testing infrastructure soon-ish. #139

Comments

ali-ramadhan commented Mar 19, 2019 • edited Loading

simonbyrne commented Mar 19, 2019

simonbyrne commented Mar 19, 2019

ali-ramadhan commented Mar 19, 2019

simonbyrne commented Mar 20, 2019 via email

ali-ramadhan commented Mar 20, 2019

christophernhill commented Mar 20, 2019 via email

ali-ramadhan commented Mar 20, 2019 • edited Loading

ali-ramadhan commented Apr 19, 2019 • edited Loading

jkozdon commented Apr 19, 2019

lcw commented Apr 19, 2019

ali-ramadhan commented Dec 18, 2019 • edited Loading

johncmarshall54 commented Dec 18, 2019 via email

ali-ramadhan commented Dec 18, 2019

ali-ramadhan commented Aug 27, 2020

ali-ramadhan commented Mar 19, 2019 •

edited

Loading

ali-ramadhan commented Mar 20, 2019 •

edited

Loading

ali-ramadhan commented Apr 19, 2019 •

edited

Loading

ali-ramadhan commented Dec 18, 2019 •

edited

Loading