Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mempool): nonce ordering #450

Merged
merged 18 commits into from
Jan 17, 2025
Merged

fix(mempool): nonce ordering #450

merged 18 commits into from
Jan 17, 2025

Conversation

Trantorian1
Copy link
Collaborator

Pull Request type

  • Bugfix

What is the current behavior?

Transactions added to the mempool are not ordered by Nonce and will not wait for previous nonces. This is an issue as it will lead to valid dependant transactions from the same account being rejected if ever they are received out of order, which can happen due to network latency for example.

Resolves: #442

What is the new behavior?

Transactions in the mempool are now ordered by readiness. A transaction is ready if its nonce directly follows that of the previous nonce for that account, as stored in db. Otherwise, a transaction is marked as pending. Pending transactions are still subject to removal due to their age.

Does this introduce a breaking change?

Yes. The database schema has been updated to store mempool transaction readiness. This does not affect full nodes but will pose problems if migrating a sequencer or devnet node.

@Trantorian1 Trantorian1 added bug Report an issue or unexpected behavior sequencer Related to the sequencing logic and implementation labels Jan 1, 2025
@Trantorian1 Trantorian1 self-assigned this Jan 1, 2025
Comment on lines 127 to 130
self.timestamp
.cmp(&other.timestamp)
.then_with(|| self.contract_address.cmp(&other.contract_address))
.then_with(|| self.nonce.cmp(&other.nonce))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use of Ord here? Sorting by timestamp first scares me a bit since it can easily be influenced by users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a TransactionIntent represent the ordering of transactions in the mempool as well as extra information to retrieve the actual transaction. You can find some docs on the matter in intent.rs and mempool/inner/mod.rs. timestamp cannot be influenced by the user and only represents the "time at which the transaction was received by the mempool". I'm a bit confused how we should be sorting transactions in the mempool if not by timestamp.

Copy link
Collaborator Author

@Trantorian1 Trantorian1 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest reading the docs for MempoolInner first as they provide a more general overview of the Mempool. If you IDE supports displaying rustdoc in-line I also suggest performing the review there instead of on github as you should find the code much easier to navigate through the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused how we should be sorting transactions in the mempool if not by timestamp.

I think my confusion is that this is shared for both ready and pending, which are used in very different ways (a "mixed account queue" and "per-account queue", respectively).

For tx_intent_queue_ready: BTreeSet<TransactionIntentReady> I think this sort criteria is fine, although you'd eventually want to replace txn inclusion priority with something based on economics instead of arrival time.

But for tx_intent_queue_pending: HashMap<Felt, BTreeSet<TransactionIntentPending>> I would imagine nonce should be the only sorting you care about. Sorting by timestamp could be wrong (if you allowed replacing txns, for example).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right, thanks for catching that one!

Copy link
Contributor

@notlesh notlesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far, but it deserves a more thorough review than I gave it.

One thing I wanted to point out is that tests for re-submitting with same or lower nonce would be good. The former is used frequently to alter txns in other blockchains (e.g. submit txn with same nonce but higher tip allows optimistic overriding of a txn).

@Trantorian1
Copy link
Collaborator Author

Trantorian1 commented Jan 9, 2025

This looks good so far, but it deserves a more thorough review than I gave it.

One thing I wanted to point out is that tests for re-submitting with same or lower nonce would be good. The former is used frequently to alter txns in other blockchains (e.g. submit txn with same nonce but higher tip allows optimistic overriding of a txn).

Submitting a lower nonce is allowed, however submitting a tx with the same nonce is not as it results in a nonce conflict. The current behavior is to reject this in the case of user-submitted transactions, but the logic for replacing transactions with the same nonce is already in place and is used by the block production so we could easily update this to be default.

Regarding tests on this matter, I have already added some. Generally speaking, you will find all tests for the mempool (including the inner mempool) here.

tests insertion of otherwise pending tx in the mempool when the tx with
the nonce before it marked as ready.
@Trantorian1 Trantorian1 requested a review from notlesh January 9, 2025 09:17
@Trantorian1 Trantorian1 added the db-migration Requires database schema changes or migration label Jan 9, 2025
@jbcaron jbcaron marked this pull request as ready for review January 9, 2025 15:31
Copy link

@Big621 Big621 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

This handle nonce update in cases when it is necessary to know the value
of a nonce for a contract before it is stored in db.
Copy link
Member

@cchudant cchudant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will review this more in depth tomorrow morning, but it looks very good.

crates/madara/client/db/src/mempool_db.rs Outdated Show resolved Hide resolved
crates/madara/client/db/src/mempool_db.rs Outdated Show resolved Hide resolved
crates/madara/client/mempool/src/inner/intent.rs Outdated Show resolved Hide resolved
pub(crate) nonce: Nonce,
/// This is the [Nonce] of the transaction right after this one. We
/// precompute this to avoid making calculations on a [Felt] in the hot
/// loop, as this can be expensive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this can be expensive

is that true? this is essentially a Felt increment right?
mmmh

I think i'm fine with nonce_next for now but i would love to switch the mempool nonces u64 at some point.

Copy link
Collaborator Author

@Trantorian1 Trantorian1 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the implementation of add for MontgomeryBackendPrimeField, which is what Stark252PrimeField relies on (Stark252PrimeField relies on U256PrimeField which is a type alias to MontgomeryBackendPrimeField):

fn add(a: &Self::BaseType, b: &Self::BaseType) -> Self::BaseType {
    let (sum, overflow) = UnsignedInteger::add(a, b);
    if Self::MODULUS_HAS_ONE_SPARE_BIT {
        if sum >= M::MODULUS {
            sum - M::MODULUS
        } else {
            sum
        }
    } else if overflow || sum >= M::MODULUS {
        let (diff, _) = UnsignedInteger::sub(&sum, &M::MODULUS);
        diff
    } else {
        sum
    }
}

And here is the implementation of add for UnsignedInteger:

pub const fn add(
    a: &UnsignedInteger<NUM_LIMBS>,
    b: &UnsignedInteger<NUM_LIMBS>,
) -> (UnsignedInteger<NUM_LIMBS>, bool) {
    let mut limbs = [0u64; NUM_LIMBS];
    let mut carry = 0u64;
    let mut i = NUM_LIMBS;
    while i > 0 {
        let (x, cb) = a.limbs[i - 1].overflowing_add(b.limbs[i - 1]);
        let (x, cc) = x.overflowing_add(carry);
        limbs[i - 1] = x;
        carry = (cb | cc) as u64;
        i -= 1;
    }
    (UnsignedInteger { limbs }, carry > 0)
}

Supposing this is inlined (hopefully this is the case, as both impls are part of the same crate), we are still looking at the following for a simple add:

fn add(a: &Self::BaseType, b: &Self::BaseType) -> Self::BaseType {
    let mut limbs = [0u64; NUM_LIMBS];
    let mut carry = 0u64;
    let mut i = NUM_LIMBS;
    while i > 0 {
        let (x, cb) = a.limbs[i - 1].overflowing_add(b.limbs[i - 1]);
        let (x, cc) = x.overflowing_add(carry);
        limbs[i - 1] = x;
        carry = (cb | cc) as u64;
        i -= 1;
    }
    let (sum, overflow) = (UnsignedInteger { limbs }, carry > 0);
    if Self::MODULUS_HAS_ONE_SPARE_BIT {
        if sum >= M::MODULUS {
            sum - M::MODULUS
        } else {
            sum
        }
    } else if overflow || sum >= M::MODULUS {
        let (diff, _) = UnsignedInteger::sub(&sum, &M::MODULUS);
        diff
    } else {
        sum
    }
}

My worry was that performing this over a very large number of transactions each insertion or removal might end up being expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure okay

crates/madara/client/mempool/src/inner/mod.rs Outdated Show resolved Hide resolved
crates/madara/client/mempool/src/inner/mod.rs Show resolved Hide resolved
/// We have one [Nonce] to [MempoolTransaction] mapping per contract
/// address.
///
/// [Nonce]: starknet_api::core::Nonce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we didnt want to use the starknet api types wherever possible, cc @jbcaron ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbf I was just getting tired of having to switch between Nonce and Felt and just ended wrapping everything in Nonce 😅. Tbf this is a bit of a shame though as for some reason Nonce is not #[repr(transparent)]

crates/madara/client/mempool/src/inner/mod.rs Outdated Show resolved Hide resolved
phantom: std::marker::PhantomData,
});
debug_assert!(removed);
self.limiter.mark_removed(&TransactionCheckedLimits::limits_for(&previous));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to decrement self.deployed_contracts here if it's replaced?
this is kinda sus we need to proptest that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the DeployedContracts thing is because of a very specific and weird corner case:
We need to support cases where clients send a DeployedAccount transaction followed by Invoke transactions, and in that case we cannot run the validate of the outer mempool. We have to skip that validation step it in that case, for better UX.
The Argent wallet notably does that when sending a transaction from a wallet that's not yet deployed.
I am thinking of reconsidering how we deal with that corner case, especially if the invoke tx arrives before the deploy account? (could it be possible in p2p? idk?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we should add a comment about why it's there)

Copy link
Collaborator Author

@Trantorian1 Trantorian1 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already being proptested :D deployed_contracts is checked in the impl of CheckInvariants for MempoolInner, here and here, and this is called by the proptest each time a state transition is applied to the System Under Test. The conditional incrementation of deployed_contracts is actually the fix to a bug which was found by proptesting :) You can find a unit test for this specific case which might provide more context.

Copy link
Member

@cchudant cchudant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix deployed contracts invariant checking, see this commit
f801aa0

which makes the proptest fail

@cchudant
Copy link
Member

I think other than that it's fine, but the complexity of all the invariants now makes me a bit worried..

@Trantorian1
Copy link
Collaborator Author

Fix deployed contracts invariant checking, see this commit f801aa0

which makes the proptest fail

Just pushed a fix to this!

@Trantorian1
Copy link
Collaborator Author

I think other than that it's fine, but the complexity of all the invariants now makes me a bit worried..

Ehh I don't like it too much either but this also just an inherently complex part of the codebase :/

@Trantorian1 Trantorian1 requested a review from cchudant January 17, 2025 07:19
Copy link
Member

@cchudant cchudant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@antiyro antiyro merged commit 4fb165a into main Jan 17, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report an issue or unexpected behavior db-migration Requires database schema changes or migration sequencer Related to the sequencing logic and implementation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

bug(mempool): correctly handle tx from same account
6 participants