Speculative Real-Time Data

The Monad architectural overview explains the asynchronous execution feature. It is essential to have a basic understanding of how this feature works before you consume Monad's fastest real-time data feeds, especially the speculative execution and block states sections.

Why do I need to understand speculative execution?

Monad's design has considerably more parallelism than a typical EVM-compatible blockchain. This includes nodes speculatively executing the transactions in a newly-received block before being certain that that block will finalize. The real-time data feeds are created during speculative execution, so if you consume them, you might see data about transactions and their effects (e.g., logs and balance changes), but those transactions may never really happen!

To avoid reacting to blockchain data that is not "real", you need to know how Monad real-time data feeds express speculative execution, and how they inform you later whether blocks (and their transactions and state effects) were ultimately added to the blockchain (or not).

Given this additional complexity, why would you consume these real-time feeds? The answer is so that your software can get the same performance benefits from speculative execution that Monad itself gets (explained below).

The cost you pay for this benefit is that you need to understand more about how Monad works, so that you can write your data processing code correctly.

Can I consume real-time data without dealing with speculative execution?

Yes.

Monad currently offers three sources of real-time data. Speculative execution does not affect the Geth real-time events compatibility offering. This data feed waits for blocks to be fully committed to the blockchain before publishing any data, thus filtering out all the real-time updates from blocks that fail to finalize.

info

Because the newHeads the logs subscriptions wait for finalization, they have one additional feature that the original Geth implementation does not: you do not need any logic to handle chain reorganizations, because they are not possible. You will never see the same block number more than once, and logs will never be removed.

Geth-style reorganizations don't occur in the monadNewHeads or monadLogs subscriptions either. Instead, you are explicitly told what the consensus algorithm is doing with the blocks you have already seen, so you know what blockchain data gets committed or not. The purpose of this document is to explain exactly what this information means, so you know how to react to it.

What benefits do I get from speculative execution?

Realtime data generated by speculative execution is valuable to a consumer for two reasons:

It allows you to use the same pipelining tricks that Monad itself uses
Sometimes it's valuable to react as soon as possible, even when the data you're seeing is not a sure thing

Advantage #1: pipelining

Monad's pipelining tricks are explained here and here. You may recall the below illustration of the laundry analogy, which helps explain the concept in both places:

Pipelining, Laundry Day — Pipelining laundry day. Top: Naive; Bottom: Pipelined. Credit: Prof. Lois Hawkes, FSU

Here's a concrete example of how you can use pipelining yourself:

Suppose you are writing a automated trading application. Further suppose that when a certain contract (e.g., a CLOB contract) emits a particular log event, your trading algorithm uses data from that log event as a signal to buy or sell.

Before actually trading, you may need to perform additional actions first. Some of the things you might do are:

Check your risk limits, to see if increasing the position will give you too much exposure, or bring you too close to a potential liquidation
Run a more complex mathematical model, if your strategy needs to perform complex computations based on the input in the trading signal
Create and cryptographically sign the transaction for your buy/sell order

All these things take time, and you can do them in preparation for your eventual trade, while you wait to find out if the block containing the market signal was actually finalized. If the block is finalized, you've already completed the essential work and can just "pull the trigger" (i.e., send the presigned trade transaction message). If the block is not finalized, you just throw the preparatory work away and never do the trade.

Advantage #2: reacting before we know it's "real"

Consider a UI component which wants to keep users informed about the progress of their transaction. When the transaction is speculatively executed, it can be marked as Pending in the UI, so that the user knows it has been seen and there's a very good chance it will go through. Even if it fails to progress to a later commit state, seeing a bit of instant feedback tends to be a superior user experience.

Also, consider again the example of our automated trading application. Timing is very important in financial markets. When prices are changing rapidly, a trade right now could be much more valuable than a trade a few seconds later. It might make sense to initiate a trade immediately off of speculative data, even knowing that some tiny percentage of the time, the trade is based on a false premise.

You may lose money sometimes, i.e., when your strategy reacts to "not real" market data in a block that fails to finalize. But usually this does not happen, and the gains from being early most of the time could outweigh the occasional losses when you react to "false" data.

There are also other kinds of applications, e.g., on-chain games, where being more interactive is better than being perfectly accurate all the time.

Block commit states

Elsewhere in the documentation, we learned that a block can be in one of four states: Proposed, Voted, Finalized, and Verified. These are sometimes called "commit states" or "consensus states" in the documentation.

In speculative real-time data feeds, whenever you are given blockchain data, you will also be told:

What commit state the associated block is in
If the initial state was not verified, you will be notified at some point later when the block transitions to a different state

Later in this article, we'll walk through exactly how the process happens in the current version of the software.

Block numbers and block ids

Once a block is canonically appended to the blockchain, it becomes uniquely identified by a (sequentially increasing) block number, also called a "block height." The inclusion of a block on the blockchain is the goal of the consensus algorithm, and is called "finalization."

When a block is first constructed, it is constructed assuming it will become the next block number, N. But prior to finalization, consensus nodes are still trying to agree whether or not this candidate block will actually become block N.

In consensus terminology, a leader constructs a block and proposes it to the Monad network. Consensus nodes vote on the proposal of a particular candidate block B to become the finalized block with number N. We call this candidate block a "proposed block." It's not part of the blockchain yet, but it probably will be soon.

It's possible for the proposal to fail for many reasons. In the most common case, the proposed block does not reach enough other nodes before the timeout period expires, due to network issues.

In that case, you would see another candidate to become the same block number N later on. Because the real-time data feeds are fed by speculative execution, you might see blockchain data for both of the block N candidates, and will be told later which one was correct.

The critical thing to understand is that this blockchain data may claim to be for "block number N", but it's not be the "real" block N yet: it's just a proposal to become block N.

Consequently, when you see real-time data for a block before it finalizes, the block number alone is not enough to uniquely identify it. This is only the block number that the block will have, if it eventually gets finalized. Instead, consensus uses a "block id" to uniquely identify proposed blocks. The id can be used to track a specific block through its commit state lifecycle.

Consider the following situation:

                                   ■
                                   ║  ┌─────────────┐   ┌─────────────┐
                                   ║  │             │   │             │
                         ┌─────────╬──┤  Block 102  ◀───┤  Block 103  │
                         │         ║  │  id: 79c25  │   │  id: 13a33  │
┌─────────────┐   ┌──────▼──────┐  ║  │             │   │             │
│             │   │             │  ║  └─────────────┘   └─────────────┘
│  Block 100  ◀───┤  Block 101  │  ║
│  id: 5b3a6  │   │  id: 6d585  │  ║
│             │   │             │  ║
└─────────────┘   └──────▲──────┘  ║  ┌─────────────┐
                         │         ║  │             │
                         └─────────╬──┤  Block 102  │
                                   ║  │  id: 3ed4d  │
                                   ║  │             │
                                   ║  └─────────────┘
                                   ║
                                   ║
                                 ◀ ║ ▶
                  Finalized blocks ║ Proposed blocks
                     (committed to ║ (may not become
                       blockchain) ║ committed to
                                   ║ blockchain)
                                   ║
                                   ■

In this diagram:

Block 101 is the latest block to be finalized
There are two competing proposed blocks vying to become block 102; they can be distinguished by their block ids
One of the proposed blocks (13a33) is the parent of another proposed block
You might see real-time data for all of these blocks; for those which are not finalized, you can start your pipeline processing right away, but you may want to wait until they reach a better commitment state before acting

info

The above situation is very rare in practice, but you should be aware that it is possible.

Consensus, execution, and commit states

Monad's real-time data stream is emitted directly by the EVM. In Category Labs' implementation of a Monad node, execution and consensus are decoupled, as they are in most Ethereum-compatible blockchain software. That is, consensus and execution are not just different algorithms, but completely separate programs that communicate with each other. Consensus is the algorithm (and the daemon) which decides whether or not a potential block will become part of the blockchain.

Execution hosts the EVM, and executes blocks on a speculative basis, before it is told the fate of the block by the consensus algorithm. Meanwhile, consensus is in the "driver's seat": it is the primary driver in the creation of new blocks, and execution acts as service that consensus uses. However, only execution produces real-time data and maintains the state database, because it is the only thing that sees the details of every log, every call frame, etc.

The exact way that blocks progress from one state to the next is explained below. Note that a block's state is from the perspective of a particular observer -- for example if you receive a QC for a block, then you can move that block to the Voted state, but if your friend didn't receive that QC yet, then she would still consider that block to be in the Proposed state.

The challenge of building a distributed consensus mechanism lies in defining rules that allow nodes to individually update their state machines in response to messages even while assuming the worst, i.e. even while assuming that they might be the only one that received that message.

First commit state: `Proposed`

When a new block is proposed by a leader (a Monad validator node), it is sent from that leader's consensus node to all other consensus nodes to be voted on. From the perspective of each of those nodes (as well as any observers), if the block is valid (i.e., follows all protocol rules), it is in the Proposed state.

Upon receiving a valid block, each consensus node will send a "yes" vote to the next leader, while also scheduling it for immediate speculative execution by its local execution daemon.

Shortly after this happens -- even as the "yes" vote is starting to be transmitted over the Internet -- the execution daemon begins executing the proposed block in the EVM. A few milliseconds later, the EVM will begin publishing real-time data for this block.

In the current implementation of the software, all blockchain data is first observed in the Proposed state. A Proposed block is a tricky thing. Nearly 100% of proposals that are received do eventually become finalized. In a statistical sense then, seeing a Proposed block seems quite good.

However, that is because usually the global Monad network is functioning properly: the vast majority of the time, there are no major telecom outages on the Internet and there is no attempted malicious activity going on.

You are seeing the block so early, that no one has voted for it except you (if you are a validator), and the leader that proposed it. The first stage vote is occurring in parallel, at roughly the same time that you are watching real-time data from its execution.

This is the paradox of a Proposed block: the transactions within it are almost certainly going to happen. And yet if you need very high levels of assurance before acting, it would be foolish to assume they definitely will: the blockchain's primary defense against errors, outages, and attacks (its consensus algorithm) has not weighed in yet.

Make sure you understand the implications of a block being in the Proposed state: it's very early, but has no defense against network problems, software errors, or malicious behavior. Each later stage reduces the likelihood of problems occurring, and the kinds of problems that can occur.

Second commit state: `Voted`

As mentioned above, the consensus algorithm is conducting a first round vote on whether or not that block will be appended to the blockchain. The goal of this referendum is to produce a "quorum (minimum number of required votes for referendum to pass) of "yes" votes.

The vote is coordinated by the second-round leader, who gathers a quorum of cryptographically-signed "yes" votes into an aggregate signature called a "quorum certificate" (QC). The second round leader sends out that QC to all consensus nodes.

If you have received a QC on a block, that means that you have proof that the block passed the first round of voting. When this happens, you may consider the block to be in the Voted state.

A few things to note about the voted state:

`Voted` does not mean it's on the blockchain

A block cannot yet be definitively appended to the blockchain when it reaches the Voted state. Monad's consensus algorithm uses two rounds of voting. Possession of a QC (i.e. proof that the first round concluded in most participants voting yes) removes the most common kinds of risks, but some risk of being reverted remains.

Possessing a QC removes the risk that a block will be "lost" due to the most common issues such as network outages and latency issues. Even if it turned out that you were the only observer with this QC due to a severe network outage, Monad's consensus algorithm has a fallback mechanism that ensures that the original block will be reproposed and ultimately finalized under almost all conditions.

Under what conditions is the existence of a QC not enough? Several things must have happened, but the most notable is that the original leader must have equivocated, i.e. proposed two different blocks at the same block height, sending each to a different set of nodes.

Equivocation is an unlikely thing for a leader to do, since it is an easily attributable fault (proof is just the pair of conflicting blocks signed by the same leader), and since the leader only hurts themselves by invalidating their own proposal. This is why a block that has reached the Voted stage is very likely to be finalized.

A block might never enter the `Voted` state

A proposed block might never receive a QC, if its first consensus vote fails. The most common reason that a block does not get voted in is network latency issues. Suppose, for example, that both your node and the leader are in Australia, and there is significant congestion with the cross-continental network traffic. In that case, most of the blockchain nodes may not learn about the proposal before the timeout period expires, and the vote will fail. Note that you will not be told that the vote fails. The failure of the vote is implicit, but you can tell it happened because of the next property.

For some block number `N`, some proposed block will eventually be voted in

If you do not receive a QC for a particular block with block number N, then at some other time you will receive a different proposed block to become block N and that will receive a QC.

A sequence like the following may occur:

You see all of real-time data for some block B1, which is proposed to become block number N
You see a different block, B2 (and all of its execution events) also competing to become block N
B2 receives a QC, and this is the only thing that happens. Namely, B1 does not receive an explicit "abandonment" event: it is just never mentioned again, and is implicitly abandoned by the network endorsing another block with the same number

Third commit state: `Finalized`

Finalized means the block is now part of the canonical blockchain and cannot be reverted without a hard fork. From this point forward, we no longer need the block id and can refer to a block solely by its block number.

When a block number N is finalized, it implicitly abandons all other proposed blocks with the same block number N. Such blocks could be in either the proposed or the voted state. We say the abandonment is implicit because no event will be recorded to explicitly announce the abandonment of previously seen block id.

If you are using pipelining programming techniques when you consume real-time data, then you are probably keeping track of some state associated with unfinalized blocks. In our trading example, this would be the pre-prepared buy or sell order transaction message.

Every time a block is finalized, you read must check for any blocks (1) with the same block number, but (2) with a different id, and abort your pipelined processing for those blocks. They will never be appended to the blockchain and the associated transactions and their effects -- whose execution events you have already seen -- will never occur.

Why are there two different stages of voting?

This is because the subject of the vote -- the thing we are trying to get agreement on -- is different.

In the first vote, the network wants to verify that a block satisfies all the protocol rules. If a QC is obtained, we know that a majority of the honest nodes agree that the block should be added. This first vote is called the "voted" stage since the block itself has been voted for.
In the second vote, we want to get cryptographically secure agreement that enough of the network has actually seen the result of the first vote. This second vote is called the "finalized" stage since, now that everyone knows the first vote succeeded, they can also agree that it must be the next block on the blockchain.

The second vote is trying to answer the question posed by the old saying "If a tree falls in a forest and no one is around to hear it, does it make a sound?"

Consider how the "fan-in, fan-out" linear communication pattern of the BFT protocol works. We talk about nodes "having a QC", but the QC is computed by the leader of the next round -- this leader is the one actually "conducting" the vote.

Even if the leader is dishonest, it cannot forge the vote because it cannot forge other nodes' cryptographic signatures. But it can fail to successfully tell enough of its peers about the QC, since it must communicate the QC to everyone. Network problems are common, so communication can always fail.

Thus we need a second round of voting, for validators to reach distributed agreement about the fact that they've seen the first QC. Now it's safe to assume that everyone (or at least the honest majority) will have the same canonical blockchain.

The reason that Voted is a very reliable commit state on Monad is that if common problems occur during the second stage vote, the algorithm will continue trying to conduct the second stage vote, failing only in some narrow corner cases involving equivocation.

Fourth commit state: `Verified`

The consensus algorithm produces one last state transition for a block, called Verified.

The Verified state is a consequence of Monad's asynchronous execution. Recall that a consensus node votes on a block before its execution is complete. This implies that consensus must be voting on the block's execution inputs, but not on the block's execution outputs.

Consensus nodes cannot be voting, for example, on the correct value of the state root produced by the execution of the block, because they don't know what it is (remember, it is being computed in parallel with the vote occurring). This means that the Voted state does not certify the correctness of any output fields in the Ethereum block header such as state_root, receipts_root, etc.

This is possible because any well-formed Ethereum block will have completely deterministic effects on the blockchain state, when executed by a conforming EVM implementation. Thus, it is safe to append a block onto the blockchain, knowing that everyone will agree on its behavior, even if we don't know exactly what the behavior will be.

Clearly though, for the blockchain to be reliable, consensus nodes must eventually vote on the correctness of the execution outputs. Suppose they did not, and further suppose that a bug existed in some execution nodes but not in others (perhaps running a different version of the client software). If there were no mechanism to feed the execution outputs back into consensus decisions, the state could become forked without anyone noticing. Consensus proposals start by assuming that all execution nodes will compute the correct state, but to prevent bugs and malicious actions from compromising the network, it must check that this happens eventually.

Here is how Monad solves this issue:

To give execution ample time to finish, execution outputs for block B are not incorporated into the consensus protocol until three rounds in the future, alongside the proposal of block B+3. When this proposal is finalized (ideally two rounds later, during the proposal of block B+5), then a supermajority of nodes will have voted for the correct values of the execution outputs.

To roughly summarize the difference:

Finalized means the block's definition (i.e., its transactions) are definitely part of the blockchain
Verified means that a supermajority stake's worth of other nodes have verified that your local node's computation of the state changes of these transactions match the supermajority's

To understand more, read here.

Does this imply that verified state is the gold standard for "100%, can never fail" transaction reporting? Also no, if you are trusting a data feed produced by a single node. Who is to say, for example, that the node producing the data in not suffering from a bug, or has been hacked?

As always in the blockchain universe, critical transactions that demand total peace of mind can only be verified by widespread agreement, broadly defined. This includes explicitly checking with other nodes, to ensure no hosts or network intermediaries have been compromised and the software is working properly.

In practice, the Voted state is usually good enough for most things: it should very rarely revert in practice, so it is also used as the "safe" block tag in Monad's RPC implementation.

Why do I need to understand speculative execution?​

Can I consume real-time data without dealing with speculative execution?​

What benefits do I get from speculative execution?​

Advantage #1: pipelining​

Advantage #2: reacting before we know it's "real"​

Block commit states

Block numbers and block ids​

Consensus, execution, and commit states

First commit state: Proposed​

Second commit state: Voted​

Voted does not mean it's on the blockchain​

A block might never enter the Voted state​

For some block number N, some proposed block will eventually be voted in​

Third commit state: Finalized​

Why are there two different stages of voting?​

Fourth commit state: Verified​