-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/file #54
base: master
Are you sure you want to change the base?
Feat/file #54
Conversation
…riting for who creates the hdf5 file
@bozhang-hpc Thanks for creating this feature. I think most of this makes good sense. I left review comments. A summary and some additional items:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment in PR
{ | ||
meminfo_t meminfo = parse_meminfo(); | ||
|
||
if(meminfo.MemAvailableMiB < (size_MB + meminfo.MemTotalMiB - threshold)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a mode to tally this synthetically (i.e. increment in ls_add and decrement in ls_free and act when that tally passes some threshold). A user might (and probably will) want to ration server ranks in order to maintain a certain headroom per process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can definitely add a memory usage keeper for ls_add()
and ls_remove()
. But the problem is, and also the reason why I didn't implement this way is: we are not actually allocating a new memory region when we add od
to ls
(ls_remove()
really releases the memory), but we allocate the memory for od
before the bulk transfer, and assign the od->data
as the bulk transfer destination, then we attach this od
's ptr to the ls
. So if we really want to keep track of the memory usage to prevent the overflow, we might already got segfault before the checking in ls_add()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can address the order of operations by making a memory tally for local storage and doing setters and getters for it. In reality, running out of physical memory won't segfault, it will start swapping into virtual memory.
…d FILEBACKEND option for swap
…swap headers; add find_package for NetCDF
…ther functions to it
CMakeLists.txt
Outdated
endif() | ||
|
||
if(HDF5_FOUND OR NetCDF_FOUND) | ||
option(FILEBACKEND_FOUND "Option to Enable File Swap on DataSpaces Server" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want set
here instead of option
.
This PR adds a swap space for dspaces when the staging memory is about to be insufficient.
User can specify the staging memory quota in the
dspaces_conf.toml
file, and when theput_rpc()
on the server side is going to allocate the buffer for bulk data transfer, it checks if the memory quota is reached and starts to write the staged data objects into a hdf5 file.The policy of which staged data object is poped out can be also specified in the
dspaces_conf.toml
file. Currently, we support FIFO (default), LIFO, and LRU.When reading the data, specifically, calling
get_rpc()
on the server side, it will first check if the queried data object is in the staging memory. If not found, it will try to read the data back from the hdf5 file to the staging memory. If new allocated buffer for reading also triggers the staging memory quota, it will first pop objects from the staging memory according to the user-specified policy until the server has enough memory for reading the data back.