Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 errors when writing Silo files #243

Open
eschnett opened this issue Feb 2, 2022 · 9 comments
Open

HDF5 errors when writing Silo files #243

eschnett opened this issue Feb 2, 2022 · 9 comments

Comments

@eschnett
Copy link

eschnett commented Feb 2, 2022

I receive the following (harmless?) HDF5 errors when writing Silo files. I am using Silo 4.11 and HDF5 1.12.1. HDF5 is configured with MPI. The relevant error message seems to be

H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5 
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 4:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 6:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 7:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 5:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 2:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 3:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 1:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 0:
  #000: /tmp/eschnetter/spack-stage/spack-stage-hdf5-1.12.1-wsry65t5gtuhsfuyi4gpmaol6q2cvxxl/spack-src/src/H5Pfapl.c line 4553 in H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5
    major: Property lists
    minor: Feature is unsupported
@markcmiller86
Copy link
Member

@eschnett thanks for report. which DB_XXX driver arg are you using in DBCreate() call? I would expect just DB_HDF5 but it something else, please mention here.

@eschnett
Copy link
Author

eschnett commented Feb 2, 2022

Yes. I am using DBCreate(filename.c_str(), DB_CLOBBER, DB_LOCAL, output_file.c_str(), DB_HDF5));.

@markcmiller86
Copy link
Member

Ok, so you've linked to an HDF5 installation which is compiled for parallel. That is fine. But, Silo is a serial library and will only ever open serial HDF5 files so the fact that HDF5 library is complaining about that seems off.

@eschnett do you by any chance also manipulate any HDF5 files directly from the application where you are seeing this message?

@brtnfld I am wondering whether the error message (in the orig. comment above), is potentially bugus? I am pretty sure that Silo is opening only serial hdf5 files but the user is reporting that the HDF5 lib is complaining about Silo's use of the evict on close feature. Now, the application itself is indeed parallel but I am fairly certain its creating only serial HDF5 files. A little more confusing is that HDF5's error messages seem to be savvy to that fact...its reporting MPI Rank ids. I am assuming its somehow interrogating them for added convenience in reporting the error message? Or, is that evidence that somehow the file itself was opened with an MPI communicator?

@brtnfld
Copy link
Contributor

brtnfld commented Feb 3, 2022

Currently, if HDF5 has parallel enabled, then calling H5Pset_evict_on_close will throw an error regardless of it being called with the sec2 driver. So this seems like a bug to me. I'm not sure why all the ranks would print the error stack message if Silo were called from one rank. That is strange. Maybe it is a feature when building with enable parallel. I would assume that all the ranks would print the original error message if more than one rank calls silo.

@markcmiller86
Copy link
Member

I'm not sure why all the ranks would print the error stack message

Well, the application may be running with one silo-file-per-processor and all ranks are opening a Silo file with sec2 driver. It seems like the message about evict on close is issued only from one rank but an error "stack" is getting dumped from all ranks.

@brtnfld
Copy link
Contributor

brtnfld commented Feb 3, 2022

Yes, that would make sense, forgot about the file-per-process case.

@markcmiller86
Copy link
Member

markcmiller86 commented Feb 3, 2022

I like the idea of parallel HDF5 library reporting MPI ranks in its error messages even when using non-parallel drivers. That could be useful at large scale where an odd-ball failure occurs on some of the ranks.

That said, I think it can only be assuming MPI_COMM_WORLD to do that, right? Well, I am betting its using MPI_COMM_WORLD when there is no communicator defined (for the file).

But, those rank IDs might be confusing to an application that is somehow using a subsetting MPI communicator for all its I/O without mentioning the fact that those are the rank ids in the world communicator.

When HDF5 is using MPI_COMM_WORLD to report those error stack messages, I would recommend a slight alteration of the message. From...

HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 4:

to

HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 4 (in MPI_COMM_WORLD):

and maybe (or not)

HDF5-DIAG: Error detected in HDF5 (1.12.1) MPI-process 4: (in file's MPI_Comm)

when it actually has a file MPI_Comm

I am guessing the guard logic for this error message...

H5Pset_evict_on_close(): evict on close is currently not supported in parallel HDF5

Is handled differently and maybe restricted by rank in some way? Because @eschnett didn't report 8 of those messages. Just 8 HDF5 error stack messages.

@eschnett
Copy link
Author

eschnett commented Feb 3, 2022

Yes, I am writing one Silo file per process.

@eschnett
Copy link
Author

This problem means that I cannot use Silo 4.11, and I am thus using Silo 4.10 instead. I have recently learned (spack/spack#34786) that Silo 4.10 requires HDF5 1.8 and does not support later versions of HDF5. This combination of Silo<->HDF5 constraints is rather inconvenient... Is there a way to resolve one of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants