-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add arm support #35
add arm support #35
Conversation
The last build failed after 4 hours... The error is from SeisSol with:
More details can be found in the log: https://github.com/SeisSol/Training/actions/runs/4899728175/jobs/8749902315. The problem seems to be from this line in SeisSol: The Any thoughts on how to get over this? @sebwolf-de @Thomas-Ulrich |
The build took more than 6 hours, so it was cancelled by GitHub... |
Took me a few hours to build on my laptop, and I had the container pushed to docker hub here Note that because this build is compiled with I tested in the emulator on my laptop using the tpv13 notebook. The gmsh, pumgen and vtk steps went through, but it failed at running seissol, with the following error:
It seems to me that this could be an issue related to the qemu emulator and might go away when running on arm64 natively. Could you please confirm? Thank you! |
Grabbed an arm instance on AWS to test the container. However, seissol fails at the same step with seg fault. Maybe it has something to do with the compile flags? Note that I simply removed the |
Tried another build natively on the arm64 node on AWS, but the run still fails with the same error. At this point, I believe there is something wrong with seissol itself. Not sure how to proceed... Below is the error message:
Note that the proxy code runs all fine:
btw, the build from #33 fails even with the proxy code because of the invalid avx2 instructions:
So, the image built here does run properly on arm64. The error is likely due to SeisSol itself. |
I probably should take above back. I further ran the three other cases in the container and found that they failed at different step (all at the very beginning though). This reminds me that the issue might be memory related - the arm64 instance I got only have 4GB of memory, which may not be enough for the run. Is that true? Do you have an estimate of memory requirement for these runs? A close monitor with the |
Nite that the arch "thunderx2t99" may work also on M1/M2 chips. Definitely not optimal, but it atleast should activate vectorization. We can also add similar settings for M1/M2 but this is definitely not a priority for us. I'm also not surprised that building a container with QEMU is taking a long time... |
I did not use libxsmm in this build as I thought libxsmm does not support arm. Then, I found that this is not accurate: there is no support in any of the released versions, but they do seem to have the development version that has support. Still, I wanted to play safe so used eigen instead. I am not sure noarch will actually impact too much of the performance on Apple Silicon because the chip does not have SVE anyway. I think the compiler should enabled the SIMD optimization by default. I don't have the time to test it out, but since this version of container is to enabled the training material to the majority, I don't think performance is the priority anyway. I am back to my office so was able to test it out on a M1 arm macbook. It turns out that the tpv13 run still seg faults at the same place, but I was able to get the Kaikoura case running. Not sure what the expected performance should be, but below is running with 4 omp threads:
I also tested the three other cases and found that the sulawesi case failed at:
The Northridge case started the calculation, but fialed with the Inf/NaN error:
So, there are still issues with the SeisSol build, but the container is built properly for the arm architecture. |
Just another update: I ran the Kaikoura case again, and the performance is significantly improved. It seems making more sense now.
|
The latest release has (undocumented) support for Arm but only for selected CPUs. It may not work for Apple silicon.
They should have NEON support at least. I don't know what code the compiler is going to emit for Arm architectures without a specified tuning target. It likely is going to be suboptimal. |
@wangyinz could you please rebase this onto the current main branch? IMHO this makes the review easier :D |
6f5cf51
to
69c4df8
Compare
(Might break some configurations on Intel hardware but might help on Arm) |
I had that set already to get to the success build: Training/Dockerfile_jupyterlab Lines 105 to 108 in 69c4df8
Also, the one in SConscript also needs to be removed.
|
The only other thing that arch does is to specify the alignment: The value in the SConscript doesn't matter, it isn't used anymore. |
So, Somehow I thought the one in SConscript gave me an error, but maybe I remembered wrong. |
Here is an overview, of my private M1 testing of the current PR :
|
My few remarks:
|
Isn't Northridge the only scenario that doesn't use dynamic rupture? |
Indeed, it's the only scenario without DR, but Kaikoura works for a few minutes, so DR is not completely broken. |
I'm also not sure why Northridge runs only with attenuation. In the current implementation, viscoelasticity uses the same wave propagation kernels as the elastic code. |
I can reproduce the segfaults, even when using a specific Apple M2 arch setting. I have no idea why, it seems to run well without Docker. I'll investigate. |
A small side comment, the "no redzone" fixes should not be necessary anymore; noarch now doesn't add that parameter anymore by default. (at least when using the latest master, v1.1.0 doesn't have that change in yet; EDIT: v1.1.1 contains that patch) |
The segfaults with tpv13 could be due to ASAGI, or the SeisSol ASAGI reader—even though ASAGI is not even used there. But: it's compiled into the binary. Thus, ASAGI is called here https://github.com/SeisSol/SeisSol/blob/master/src/Reader/AsagiReader.h which in turn is called by https://github.com/SeisSol/SeisSol/blob/313c4e4c459b1ea67302b8887650f51d1ebbf9e7/src/Initializer/ParameterDB.cpp#L626 when initializing an easi model. And the last message you'd see before ending up there is exactly a warning like "falling back to materials sampled from cell barycenters". |
I can reproduce the crashes but somehow have a very hard time debugging them due to an unrelated issue :( |
I tried a few different combinations and we learned that:
The latest PSpaMM also have issues and we have to use In any case, arm users can use the latest build pushed to Docker Hub: https://hub.docker.com/r/wangyinz/seissoltraining/tags with the command
The tpv13 case ran successfully at almost 40 GFLOP/s on an M1 Macbook :
|
It should be noted that using PSpaMM together with |
Quite surprisingly, the multi-arch docker build finished! Note that this branch has the setup to build both amd64 and arm64 architectures in the same image (which is here already). Previously the arm64 build runs too slow that it cannot finish within the 6 hour limit of the running. I guess GitHub has upgraded the runnings and now the workflow finishes in less than 4 hours. Still slow, but we now can ask the attendees to pull the same image regardless of the arch they need. |
Great to see that! ... That reminds me... There was a doubling of the cores for the runners recently: https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/ |
Hi I am still making my way to Seattle - does this mean we don’t need my M1
test build anymore?
Prof. Dr. habil. Alice-Agnes Gabriel
Guest Professor
Earthquake Physics, Institute of Geophysics, Department of Earth and
Environmental Sciences, Ludwig-Maximilians-Universität (LMU) München,
Munich, Germany
Associate Professor
Institute of Geophysics and Planetary Physics
Scripps Institution of Oceanography
University of California at San Diego, La Jolla, USA
…On Tue 21. May 2024 at 14:23, David Schneller ***@***.***> wrote:
Yay! ... That reminds me... There was a doubling of the cores for the
runners recently:
https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/
—
Reply to this email directly, view it on GitHub
<#35 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACHOTRF75QXC4BY6DV2E2MDZDO3OLAVCNFSM6AAAAAAXXX3EJWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGQ3DKOJTGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I have tested on my m1 macbook and can confirm this build does not have the NaN error (at least not in the tpv13 case). |
&& cd SeisSol \ | ||
&& mkdir build_hsw && cd build_hsw \ | ||
&& export PATH=$PATH:/home/tools/bin \ | ||
&& CC=mpicc CXX=mpicxx cmake .. -DCMAKE_PREFIX_PATH=/home/tools -DGEMM_TOOLS_LIST=PSpaMM -DHOST_ARCH=noarch -DASAGI=on -DNETCDF=on -DORDER=4 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-DDR_QUAD_RULE_OPTIONS=dunavant
if we want to harmonize the dockerfiles and decrease run-times.
As a note, I can reproduce PSpaMM+neon failing with an inf/nan, while emulating the system with QEMU on an X86-64 machine. Maybe it is indeed possible to debug the ARM container (albeit slow, with crashing Python) for us non-mac users. |
This branch is a work in progress of adding a arm64 native build of container. This version will simply use a mpich library, so it will not run on HPC systems like #33. We should eventually build two parallel versions to support different architectures.