Asynchronous Execution
Background: interleaving execution and consensus is inefficient
Consensus is the process where nodes come to agreement about the official ordering of transactions. Execution is the process of actually executing those transactions and updating the state.
In Ethereum and most other blockchains, execution is a prerequisite to consensus. When nodes come to consensus on a block, they are agreeing on both (1) the list of transactions in the block, and (2) the merkle root summarizing all of the state after executing that list of transactions. As a result, the leader must execute all of the transactions in the proposed block before sharing the proposal, and the validating nodes must execute those transactions before responding with a vote.
We refer to this style of blockchain as one that has execution interleaved with consensus. In this paradigm, the time budget for execution is extremely limited, since it has to happen twice and leave enough time for multiple rounds of cross-globe communication for consensus.
Additionally, since execution will block consensus, the per-block gas limit must be chosen extremely conservatively to ensure that the computation completes on all nodes within the budget even in the worst-case scenario.
The result is that the per-block gas limit is a tiny fraction of the block time. In particular, in Ethereum, the gas limit (30M worst-case gas) corresponds to a roughly 100ms time budget for a system with a block time of 12 ms:
That's 1% of the block time! In short, interleaving consensus and execution has a massive time-shrinking effect.
What if it didn't have to be this way?
Asynchronous Execution explained
Monad decouples consensus from execution, moving execution out of the hot path of consensus into a separate, slightly-lagged swim lane.
In Monad, nodes come to consensus (i.e. agreement about the official ordering of transactions), without either the leader or validating nodes having to have executed those transactions yet.
That is, the leader proposes an ordering without yet knowing the resultant state root, and the validating nodes vote on block validity without knowing (for example) whether all of the transactions in the block execute without reverting.
As soon as a block is finalized, each node (leaders and validators) executes the transactions in that block.
As a result of this simple change, execution can be budgeted the full block time. To see why, consider the somewhat stylized diagrams, in which dark purple rectangles correspond to execution, and light purple rectangles correspond to consensus:
In interleaved execution, the sum of the execution and consensus budgets equals the block time, and consensus occupies most of the block time.
In asynchronous execution, consensus occupies the full block time - and so does execution, because they are occurring in separate swim lanes.
Comparing the two styles side by side, initially the result seems similar (with asynchronous execution receiving a slight edge from being able to use the full block time for consensus.
But once we adopt the asynchronous style, the execution budget can be massively raised (corresponding to larger dark rectangles):
Utilizing the full block time for execution gives a massive boost to execution throughput.
Determined ordering implies state determinism
How does asynchronous execution still achieve state machine replication?
Here is an obvious but crucial insight: starting from a common state, given an official ordering of transactions, the final state is completely determined. Execution is required to unveil the truth, but the truth is already determined.
Monad takes advantage of this insight by removing the requirement that nodes execute prior to consensus. Node agreement is purely about the official ordering; each node executes the transactions in block N
independently while commencing consensus on block N+1
.
This allows for a gas budget corresponding to the full block time, since execution merely needs to keep up with consensus. Additionally, this approach is more tolerant of variation in exact compute time, since execution only needs to keep up with consensus on average.
This is true even if some transactions in the list fail, since they will fail consistently across all nodes. Below is an example:
Delayed merkle roots still ensure state machine replication
The main objections one might have to Asynchronous Execution are:
- What happens if one of the nodes is malicious, so doesn't execute the exact transactions specified in consensus? (For example, say it omits certain transactions, or just sets a state variable to an arbitrary value of its choice.)
- What happens if one of the nodes makes a mistake in execution?
To address these concerns, in Monad the block proposals include a merkle root delayed by D
blocks, where D
is a systemwide parameter (currently anticipated to be 10).
As a result of this delayed merkle root:
- After the network comes to consensus (2/3 majority vote) on block
N
, it means that the network has agreed that the official consequence of blockN-D
is a state rooted in merkle rootM
. Light clients can then query full nodes for merkle proofs of state variable values at blockN-D
. - Any node with an error in execution at block
N-D
will fall out of consensus starting at blockN
. This will trigger a rollback on that node to the end state of blockN-D-1
, followed by re-execution of the transactions in blockN-D
(hopefully resulting in the merkle root matching), followed by a re-execution of the transactions in blockN-D+1
,N-D+2
, etc.
Ethereum's approach uses consensus to enforce state machine replication in a very strict way: after nodes come to consensus, we know that the supermajority agrees about the official ordering and the state resulting from that ordering. However, this strictness comes at great cost - extremely limited throughput. Monad loosens the strictness slightly, to great effect.
On finality
In MonadBFT, finality is single-slot (1 second), and execution outcome will generally lag by less than 1 second for those using a full node. Let's unpack this a bit.
Finality in Monad is single-slot (1 second). If you submit a transaction, you will see the transaction's official order (among all other transactions) after a single block. There is no potential for reordering, barring malicious action from a supermajority of the network. This makes Monad's finality significantly faster than Ethereum's (2 epochs, aka 12.8 minutes).
The execution outcome of a transaction (did it succeed or fail? what are the balances afterward?) will generally lag finality by less than 1 second on full nodes. Anyone who needs to know the outcome of a transaction quickly (for example, a high-frequency trader wanting to know the status of an order) can run a full node. Monad is designed to minimize the cost of full nodes; see Hardware Requirements for further information.
Anyone who wants to securely query the outcome of a transaction without running a full node can run a light client while querying a full node for balances with merkle proofs. In this case, queries will be lagged by the merkle root delay (D=10
blocks, i.e. 10 seconds). Note that most users currently view the state of the blockchain using software (browser) wallets or via a block explorer. Neither of these usage patterns involve a light client.
Some readers might mistakenly conflate the merkle root delay (D=10
blocks) with finality and mistakenly assume that finality is 10 blocks. That's not true. The official transaction order is determined after 1 block, after which there will be no reorgs without Byzantine behavior from a supermajority.
Balance validation at time of consensus
Deferring execution is incredibly powerful, as it allows execution and consensus to occur in parallel. This massively expands the time budget for execution.
An obvious objection is: now that consensus nodes have a delayed view of state, what prevents them from mistakenly including transactions from accounts that have spent all of their gas? This would create a Denial-of-Service vector.
To defend against this, Monad nodes validate that the balance of the account is sufficient to fullfill the maximum debit in the user's account associated with in-flight transactions. For each transaction, this is computed as a sum of the value transfer (value
) and the amount of gas a user is willing to spend (gas_limit
) multiplied by maximum amount a user is willing to pay per gas (maxFeePerGas
). If the available balance is not sufficient to pay the sum of value and maximum cost of the transaction, the transaction is not included in a block.
You can think of the available balance tracked by consensus as decrementing in realtime as transactions are included in blocks. Although the node's view of the full state is lagged, the available balance always reflects up-to-date expenditures.