Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] #14803 State changes for Virtual Mega Map #17689

Draft
wants to merge 13 commits into
base: 16785-D-virtualmap-keybytes-valueobjects
Choose a base branch
from

Conversation

thenswan
Copy link
Contributor

@thenswan thenswan commented Feb 3, 2025

Description:
WIP

Related issue(s):
Fixes #14803

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

}

@Override
public void put(T value) {
this.value = value == null ? NULL_VALUE : value;
this.modification = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null values need to be handled separately. If a value was not null, then put(null) was called, modification is null, and isModified() below returns false, which is incorrect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 🙌🏻 Addressed, please look into it again 🙂

}

/** {@inheritDoc} */
@Override
public long size() {
final var size = virtualMap.size();
final var size = megaMap.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, mark this method as deprecated (or even drop it completely, if possible). State sizes will be tracked using a different mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to mark this method as deprecated rather than remove it, as it's still in use, and removing it would further increase the size of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should OnDiskWritableKVState#sizeOfDataSource also be deprecated?


final Bytes keyBytes = keyCodec.toBytes(key);

return stateIdBytes.append(keyBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be too much pressure on GC, too many short living objects are allocated. What about

final byte[] bytes = new byte[Short.BYTES + keyCodec.measureRecord(key)];
final BufferedData writer = BufferedData.wrap(bytes);
writer.writeShort((short) stateId);
keyCodec.write(key, writer);
return Bytes.wrap(bytes);

This is still not ideal, but slightly better than above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Optimized it, there are a couple of details:

  1. There is no writeShort method -- made a workaround.
  2. Wdyt about exception handling?


// Queue head index. This is the index at which the next element is retrieved from
// a queue. If equal to tail, the queue is empty
private long head = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Head / tail may not be local vars or fields, they have to be persisted in the store (mega map), otherwise they would always be set back to 1 for every queue, when a state snapshot is read from disk.

Multiple issues here:

  • There should be a way to store head/tail in the store. I suggest introducing a new class to hold this information (queue state) and serialize/deserialize head/tail (two longs)
  • This class should provide atomic operations to increase head/tail. Writes are single-threaded, so "head" should be more or less fine, but reads may be done on any thread. Well, I actually don't quite understand what a "read" operation for a queue is. Every "read" is actually a "write", unless it's a "peek". Something to think about
  • After every change to head/tail, i.e. on every add() or remove() call, the queue state must be updated and persisted to the store
  • I can't say for sure, but the queue state should be shared between readable and writable queue states - or it can be read from the store on every call (performance impact is tbd)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

* @return a {@link Bytes} object containing exactly 2 bytes in big-endian order
* @throws IllegalArgumentException if the state ID is outside [0..65535]
*/
private Bytes getMegaMapKey() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to share such methods between readable and writable states (singleton, queue, and KV). For example, it can be defined in ReadableSingletonStateBase, which is then extended by WritableSingletonStateBase and OnDiskReadableSingletonState

Copy link
Contributor Author

@thenswan thenswan Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and came to the conclusion that this method is an implementation detail of "OnDisk" classes, so I decided to find another way after addressing fundamental review points 🙂 Need to take into account that this method will also be used for migration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized this

…ate-changes-for-virtual-mega-map

# Conflicts:
#	platform-sdk/swirlds-state-impl/src/main/java/com/swirlds/state/merkle/disk/OnDiskReadableKVState.java
#	platform-sdk/swirlds-state-impl/src/main/java/com/swirlds/state/merkle/disk/OnDiskWritableKVState.java
Signed-off-by: Nikita Lebedev <[email protected]>
Signed-off-by: Nikita Lebedev <[email protected]>
Signed-off-by: Nikita Lebedev <[email protected]>
@NonNull final Consumer<T> backingStoreMutator) {
super(stateKey, backingStoreAccessor);
this.backingStoreMutator = Objects.requireNonNull(backingStoreMutator);
public WritableSingletonStateBase(@NonNull final String serviceName, @NonNull final String stateKey) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the states now accept "service name" and "state key" in constructors. These two values are used to compute the right VirtualMap key prefix for the state. It's done by finding the right state ID first using StateUtils.stateIdFor(), and then this ID is converted to bytes. Wouldn't it be easier to compute the ID once and store it in states rather than service name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to change WritableSingletonStateBase constructor signature to public WritableSingletonStateBase(@NonNull final Bytes stateId, @NonNull final String stateKey)?

@Override
protected E peekOnDataSource() {
final QueueState state = queueHelper.getState();
final E value = state.isEmpty() ? null : queueHelper.getFromStore(queueHelper.getState().getHead());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/queueHelper.getState()/state/

*/
@Override
protected T readFromDataSource() {
final var value = virtualMap.get(getVirtualMapKey(serviceName, stateKey), valueCodec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VirtualMap key for singletons is a constant, no need to calculate it every time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is cached in StateUtils anyway, so if I start updating it, it will bloat the code in the multiple classes (where a similar update can be introduced), which is redundant, in my opinion. Wdyt?

virtualMap.remove(getVirtualMapKey(serviceName, stateKey, state.getHeadAndIncrement()));
// Log to transaction state log, what was added
logQueueRemove(computeLabel(serviceName, stateKey), valueToRemove);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the state is empty (queue has no elements), should this method throw an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should not because of this condition: value == null ? "null" : value.toString()

@Override
protected void addToDataSource(@NonNull E element) {
final QueueState state = queueHelper.getState();
virtualMap.put(getVirtualMapKey(serviceName, stateKey, state.getTailAndIncrement()), element, valueCodec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only the element should be put to the virtual map, but the updated state (queue state) should be put, too, same in removeFromDataSource()

// TODO: decide on method from VirtualMapMigration
@Override
public MerkleNode migrate(@NonNull final Configuration configuration, int version) {
if (version < CURRENT_VERSION) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually CURRENT_VERSION may be increased to 33, 34, and so on. It would be safer to change this to if (version < 32)

DigestType.SHA_384,
MEGA_MAP_MAX_KEYS_HINT,
merkleDbConfig.hashesRamToDiskThreshold());
final var virtualMapLabel = "VirtualMap";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "StateMap" or "RootMap"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I favor the "StateMap," which indicates that this map holds all state data.

@imalygin wdyt?

// Threads which iterate over the given Virtual Map, perform some operation and write into its own output queue/buffer
final int THREAD_COUNT = 1;
final long MEGA_MAP_MAX_KEYS_HINT = 1_000_000_000;
final boolean VALIDATE_MIGRATION_FF = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread count and validation flag are to be extracted into a config, so they can be set without changing code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that was the plan, but I need support with it 🙂

// TODO: double check here if it is valid to use #putBytes
// TODO: check possibilities for optimization
InterruptableConsumer<Pair<Bytes, Bytes>> handler = pair ->
virtualMap.putBytes(stateIdBytes.append(pair.key()), pair.value());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if stateIdBytes.append(key) is faster or slower than getVirtualMapKey(serviceName, stateKey, key). Worth measuring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I marked this place for performance inspection as it is worth trying and measuring options.


logger.info(STARTUP.getMarker(), "Migrating Queue states to the one Virtual Map...");

IntStream.range(0, getNumberOfChildren())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it slightly easier to read, KV, queue, and singleton state migrations can be extracted into three separate methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can propose to refactor it further:

  • use for loop instead of streams
  • extract repetitive code (logging, etc.)

Wdyt?

final var labelPair = decomposeLabel(label);
final var serviceName = labelPair.key();
final var stateKey = labelPair.value();
final FCQueue<ValueLeaf> originalStore = queueNode.getRight();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FCQueue is probably redundant here. QueueNode has iterator() method, which iterates over all elements in the queue state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the code from here: #15369
I think it is how it is for a reason -- to get the codec from ValueLeaf so it can be put to the Virtual Map.

Signed-off-by: Nikita Lebedev <[email protected]>
@@ -810,14 +810,14 @@ void migrate_currentVersion() {
var node2 = mock(MerkleNode.class);
stateRoot.setChild(1, node2);
reset(node1, node2);
assertSame(stateRoot, stateRoot.migrate(CURRENT_VERSION));
assertSame(stateRoot, stateRoot.migrate(, CURRENT_VERSION, ));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think think it compiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I changed many things and was waiting for the "direction approval" before going on and doing a lot of other changes.

modifications.clear();
super.reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - why did you flip the order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think it was accidental, thanks for pointing it out. But I checked it more carefully now and noticed that the order in this class and in WritableSingletonStateBase is different on main, so maybe we can discuss how to update it, if it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants