Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in Euphonic 1.3.2 on IDAaaS for large number of q-points #356

Open
mducle opened this issue Jan 21, 2025 · 3 comments
Open

Crash in Euphonic 1.3.2 on IDAaaS for large number of q-points #356

mducle opened this issue Jan 21, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@mducle
Copy link
Member

mducle commented Jan 21, 2025

@davidvoneshen reported a bug Horace-euphonic-interface#40 where using Euphonic together with Horace with a large number of q-points per chunk causes a segfault.

The bug seems to be in the Euphonic C-code but I'm not exactly sure where.

The system is relatively large with 80 atoms in the supercell, and the crash occurs when the chunk size is set larger than around 1.15e6 q-points.

@mducle mducle added the bug Something isn't working label Jan 21, 2025
@ajjackson
Copy link
Collaborator

ajjackson commented Jan 22, 2025

Is this simply a "running out of memory" issue that should be addressed by using a smaller chunk size? Where is that determined?

Update:
ah, I see from the linked issue that the problem actually emerges when more memory is available. So more likely to be a pointer arithmetic issue. Capping the chunksize automatically might still be a reasonable solution, but maybe we just need some bigger/unsigned integers.

I guess the next step is to put together a minimal example that can be used as a failing test.

@mducle
Copy link
Member Author

mducle commented Jan 22, 2025

Yes, I'll try to get a smaller test up (without Horace). In any case I think it's probably a good idea to use ptrdiff_t or size_t instead of int.

The chunk size is an input of the euphonic (Python) function but is set automatically in the Horace (Matlab) code based on the amount of system free memory - so for a large instance with 480GB of memory it just passes all the q-points requested in one chunk.

@ajjackson
Copy link
Collaborator

ajjackson commented Jan 22, 2025

Yes, I'll try to get a smaller test up (without Horace). In any case I think it's probably a good idea to use ptrdiff_t or size_t instead of int.

Yes, this is best practice for pointers anyway.

The chunk size is an input of the euphonic (Python) function but is set automatically in the Horace (Matlab) code based on the amount of system free memory - so for a large instance with 480GB of memory it just passes all the q-points requested in one chunk.

I see! It shouldn't be too difficult to build in a cap if that turns out to be necessary/sensible then. We probably don't want to do all the pointer arithmetic with long types just to save a few (already huge) chunks. (I think size_t is as long as possible anyway so perhaps that's a non-issue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants