Solve multi-versioning persistence problem #412

miloszm · 2024-12-05T19:28:54Z

Summary

Currently, each commit holds its own Merkle-tree positions file, which incurs overhead.
A scheme for eliminating this extra storage is needed to reduce overhead.

Possible solution design or implementation

The solution could be as follows: modify the "main" data section in such a way that parts which belong to outstanding commits are preserved. This solution involves invalidating sessions which are in progress. A scheme of delaying such modification to a time between sessions or to multiply the parts so that they do not conflict needs to be devised.
To be more precise: when finalising, some memory files in "main" data section will be overwritten,
yet they still belong to some active commits. This is easy to alleviate by copying these "main" files to commit-specific
data segments (only baseless commits or commits whose base is the current commit need to be considered).
Caveat is that after such copying we need to somehow refresh our "contract" objects that may be part of a session.
This can only be done effectively between sessions (or at session start-up, to be exact),
and not while session is running. Hence, changes need to be delayed or arranged in such a way that they do not affect running sessions and are able to "wait" until session start-up. Another, much preferred, solution would be to leave the memory files in place and duplicate them to some other location which will from now on be considered the "main". This would require a mechanism for a floating (or moving) "main" surface (or edge) which might also be a feasible solution, eliminating the need of delays and of dealing with sessions. Yet such floating "main" comes at a cost of additional complexity in the storage scheme.

Facit

After consideration - the moving "main" edge seems to be the direction to take. It needs to be elaborated in a form of exact specifications, which are the subject of this issue.

Additional explanation

Currently, main "edge" is flat, it consists of the contents of the "main"/"memory" folder. We need to devise a scheme that allows the "edge" to live in temporary subfolders created on demand, existing as long as the outstanding commits that need it are still alive. Should outstanding commit be finalised, the edge can be collapsed, yet we need to be careful about other commits that may rely on it in the meantime.
It is probably much simpler to have an edge which never collapses, with periodic garbage collection clearing the old layers. Our edge would basically always "move on", so it would be a "moving" edge rather than a "floating" edge.
Each commit could have an "edge level" assigned to it, and the commit would stick to this level as long as it lives.
Once there are no commits using a given level, the level can be rubbed out.

Additional context

This is a future enhancement

miloszm · 2025-01-28T09:10:05Z

Moving edge model described above is insufficient as its multiple versions are not fully independent. It needs to be slightly extended to support versions which are fully independent of each other and updatable as in the "fully persistent model" (see "persistent data structure"/Wikipedia). Complexity and testability of such solution requires a separate persistence plug-in which will:

be independently tested
be flexible in terms of the storage technique used - filesystem or database
lead to a simpler/more readable/maintainable code

Instead of direct storage code interspersed among commit handling code, we'll have a clearly defined module hiding the fact which storage technique is being used.

The proposed interface of the plug-in will be as follows:

create_version(version)
finalize_version(version)
store(bytes, contract_id, version, type)
retrieve(contract_id, version, type)

Notes:

type is currently either memory page index or element (which contains merkle tree of contract memory pages)
contract_id could basically be any label, yet we maintain domain-specific nomenclature to ease understanding of what the module does
version removal (here called finalisation) is important as we want to avoid complexity of a confluent storage

Initially, file system based version will be implemented, and hooked up for realistic testing. Subsequently, not only the "main" managing code part will use the plugin, but "commit-specific" part as well. Obviously, for "commit-specific" code use of the plug-in we'll need to extend the above interface.

Database-based plug-in implementation needs to take into account that Piecrust uses memory-mapped files.

Proposed name for the plug-in storage module is: storeroom

Note that the difference between "commit-specific" and "main" aspects of storage is that commit-specific obeys the fact that one commit can be and mostly is based on another commit, while in "main" we only have one solid, i.e., main, state, being capable of versioning, yet there is no concept of commits and dependencies between them.

miloszm self-assigned this Dec 5, 2024

miloszm changed the title ~~Detach commit id from commit root~~ Solve multi-versioning persistence problem Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve multi-versioning persistence problem #412

Solve multi-versioning persistence problem #412

miloszm commented Dec 5, 2024 •

edited

Loading

miloszm commented Jan 28, 2025 •

edited

Loading

Solve multi-versioning persistence problem #412

Solve multi-versioning persistence problem #412

Comments

miloszm commented Dec 5, 2024 • edited Loading

Summary

Possible solution design or implementation

Facit

Additional explanation

Additional context

miloszm commented Jan 28, 2025 • edited Loading

miloszm commented Dec 5, 2024 •

edited

Loading

miloszm commented Jan 28, 2025 •

edited

Loading