> ## Documentation Index > Fetch the complete documentation index at: https://docs.monad.xyz/llms.txt > Use this file to discover all available pages before exploring further. # Event rings in detail ## Event ring files and content types Event rings are made up of four shared memory segments. Two of these -- the event descriptor array and payload buffer -- are described in the [overview documentation](/execution-events/overview). The third shared memory segment contains a header that describes metadata about the event ring. The fourth (the "context area") is a special feature that is not needed for execution events. The shared memory segments are mapped into a process' address space using [mmap(2)](https://man7.org/linux/man-pages/man2/mmap.2.html). This means that the event ring's data structures live in a file somewhere, and that shared access to it is obtained by creating shared memory mappings of that file. Most of the time an event ring is a regular file, created on a special in-memory file system called [hugetlbfs](https://lwn.net/Articles/375096/). hugetlbfs is similar to the [tmpfs](https://man7.org/linux/man-pages/man5/tmpfs.5.html) in-memory filesystem, but supports the creation of files backed by large page sizes. The use of large pages is just an [optimization](https://lwn.net/Articles/374424/): event ring files may be created on any file system. If the execution daemon is told to create an event ring file on a filesystem without hugetlb mmap support, it will log a performance warning but will still create the file. To learn more about hugetlbfs and how it is used, read [this page](/execution-events/advanced#location-of-event-ring-files). ### Event ring configuration To use execution events, the execution daemon must be started with the command line parameter: ``` --exec-event-ring [] ``` Without this command line parameter, execution will not publish any events. This command line parameter (and mounting a hugetlbfs filesystem, for that matter) are not part of the default configuration instructions for the execution daemon. A [separate guide](https://validator-docs.vercel.app/docs/full_node/events-and-websockets) covers mounting a hugetlbfs filesystem and modifying the command line in the systemd unit configuration files. Note that the configuration string is optional; if you pass `--exec-event-ring` without an argument (which is the recommended thing to do), this is equivalent to passing `--exec-event-ring monad-exec-events`, where `monad-exec-events` is the default execution event ring file name. The event ring configuration string has the form: ``` [::] ``` In other words, the configuration string consists of three `:`-separated fields; the first field is required but the second two are optional. Here is an example of the command line parameter, with just the first field: ``` --exec-event-ring /var/lib/hugetlbfs/user/monad/pagesize-2MB/event-rings/monad-exec-events ``` and another example, with all three: ``` --exec-event-ring monad-exec-events:21:29 ``` The first field is the name of the event ring file. The execution daemon interprets this field in two different ways: * If it is purely a file name -- i.e., if the path does not contain any `/` characters -- then it is interpreted as a file living in the default event ring file directory; this is the directory returned by the API function `monad_event_open_ring_dir_fd`; it uses [libhugetlbfs](https://github.com/libhugetlbfs/libhugetlbfs) to locate the most suitable mount point for a hugetlbfs file system, and automatically creates subdirectory called `event-rings` underneath that, if it does not already exist (see [here](/execution-events/advanced#location-of-event-ring-files) for more information); the event ring file will be created in that `event-rings` subdirectory * If the path has multiple path components -- i.e., if it contains at least one `/` character -- then this path will be used as written *even if* it is not resident on a hugetlbfs file system; the rationale for why someone might want this is explained below The "shift" parameters are power-of-two exponents that determine the event ring's size.[^1] A `` of 21 means there will be 2^21 descriptors in the ring's event descriptor array. This means approximately 2 million events can be written before the descriptor ring buffer wraps around and overwrites older event descriptors. A `` of 29 means there will be 2^29 bytes in the payload buffer array. This means 512 MiB worth of event payloads can be recorded before the payload buffer wraps around and overwrites an older event's payload. If the event ring size parameters are not specified, then default values are used. Why might you increase these values from their defaults? If your reader program crashes, but execution does not, then you will likely miss some events during the time when your program is not running. Your application might not care about lost events for old blocks, but if it does, you'll need to retrieve them somehow. If not much time has gone by, chances are good that the events you are missing are still sitting there in the event ring memory, i.e., are not overwritten yet. The default sizes are large enough to hold several minutes worth of blocks at 10k TPS. Increasing these values allows you to rewind further back in time. Be aware that there is a fixed-size pool of huge pages. The pool size can be changed by modifying the system's configuration, see the discussion of `/proc/sys/vm/nr_hugepages` [here](https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt). If an event ring is created on a hugetlbfs mount, and its size exceeds the number of available huge pages, then the execution daemon will exit with an error message that reports a "no space left on device" error. For example, here we tried to allocate an event ring with a one terabyte payload buffer: ``` LOG_ERROR event library error -- monad_event_ring_init_simple@event_ring_util.c:78: posix_fallocate failed for event ring file `/dev/hugepages/monad-exec-events`, size 1099647942656: No space left on device (28) ``` If you happen to have multiple terabytes of main memory available, you could pass a file path containing a `/` character to a file on a tmpfs mount and it would work, e.g., `--exec-event-ring /my-giant-tmpfs/monad-exec-events`. If you need to rewind back especially far -- or if you are missing events because execution itself has crashed -- then you will need to use alternative recovery methods described elsewhere.[^2] [^1]: They are called "shifts" because `1UL << x` is equal to `2^x` [^2]: The alternative recovery methods are still in development and will be available in the next SDK release ### Event ring file format The event ring file format is simple: all four sections are laid out sequentially and aligned to a large page boundary, and the header describes the size of each section. ``` ╔═Event ring file══╗ ║ ┌──────────────┐ ║ ║ │ │ ║ ║ │ Header │ ║ ║ │ │ ║ ║ ├──────────────┤ ║ ║ │ │ ║ ║ │ Event │ ║ ║ │ Descriptor │ ║ ║ │ Array │ ║ ║ │ │ ║ ║ ├──────────────┤ ║ ║ │ │ ║ ║ │ │ ║ ║ │ │ ║ ║ │ │ ║ ║ │ Payload │ ║ ║ │ Buffer │ ║ ║ │ │ ║ ║ │ │ ║ ║ │ │ ║ ║ │ │ ║ ║ │ │ ║ ║ ├──────────────┤ ║ ║ │ │ ║ ║ │ Context │ ║ ║ │ Area │ ║ ║ │ │ ║ ║ └──────────────┘ ║ ╚══════════════════╝ ``` The descriptor array is a just an array of `struct monad_event_descriptor` objects, and the payload buffer is a flat byte array (i.e., it has type `uint8_t[]`). The header structure is defined this way: ```c theme={null} /// Event ring shared memory files start with this header structure struct monad_event_ring_header { char magic[6]; ///< 'RINGvv', vv = version number enum monad_event_content_type content_type; ///< Kind of events in this ring uint8_t schema_hash[32]; ///< Ensure event definitions match struct monad_event_ring_size size; ///< Size of following structures struct monad_event_ring_control control; ///< Tracks ring's state/status }; ``` ### Event content types The `content_type` header field is needed because the event ring library -- both the reader and writer APIs -- performs unstructured I/O: the functions read and write raw `uint8_t[]` event payloads, and the event descriptors contain plain `uint16_t` numerical event codes. Much like the UNIX `read(2)` and `write(2)` file I/O system calls, the event ring API functions do not inherently know the format of data they are working with. This is the reason why the `event_type` field in the event descriptor is the generic integer type `uint16_t` instead of `enum monad_exec_event_type`. The assumption here is that the reader and writer know the binary format of the data they're both working with, and they treat the raw data as if it has this format by type-casting it when needed, e.g., ```c theme={null} const struct monad_exec_block_start *block_start = nullptr; // We assume that the event ring file we opened contains execution events, // and thus further assume that it makes sense to compare `event->event_type` // to a value of type `enum monad_exec_event_type` if (event->event_type == MONAD_EXEC_BLOCK_START) { // Since this is MONAD_EXEC_BLOCK_START, we can cast the `const void *` // payload to a `const struct monad_exec_block_start *` payload. // Note: implicit type-casting from `void *` is allowed in C, but not C++ block_start = monad_event_ring_payload_peek(event_ring, event); } ``` We need some kind of error-detection mechanism to ensure this is safe to do. The event ring file header contains a "content type" enumeration constant explaining what kind of event data it contains: ```c theme={null} enum monad_event_content_type : uint16_t { MONAD_EVENT_CONTENT_TYPE_NONE, ///< An invalid value MONAD_EVENT_CONTENT_TYPE_TEST, ///< Used in simple automated tests MONAD_EVENT_CONTENT_TYPE_EXEC, ///< Core execution events MONAD_EVENT_CONTENT_TYPE_PERF, ///< Performance tracer events MONAD_EVENT_CONTENT_TYPE_COUNT ///< Total number of known event rings }; ``` The execution events are always recorded to a ring with `content_type` equal to `MONAD_EVENT_CONTENT_TYPE_EXEC`. ### Binary schema versioning: the `schema_hash` field If `content_type` is equal to `MONAD_EVENT_CONTENT_TYPE_EXEC`, then we know a ring is supposed to execution events, but what if the event payload definitions change? Or what if the enumeration constants in `enum monad_exec_event_type` change? Suppose that a user compiles their application with a particular version of `exec_event_ctypes.h`, the file which defines the execution event payloads and the event type enumeration. Now imagine that some time later, the user deploys a new version of the execution node, which was compiled with a different version of `exec_event_ctypes.h`, causing the memory representation of the event payloads to be different. If the reader does not remember to recompile their application with the new header, it could misinterpret the bytes in the event payloads, assuming they have the old layout from their old (compile-time) version of `exec_event_ctypes.h`. To prevent these kinds of errors, the binary layout of all event payloads is summarized by a hash value which changes any time a change is made to any event payload for that content type. In addition to payload changes, any change to `enum monad_exec_event_type` will also generate a new hash. This mechanism is called the "schema hash", and the hash value is present as a global, read-only byte array inside the library code (defined in `exec_event_ctypes_metadata.c`). If the hash value in this array does not match the hash value in the event ring file header, then the binary formats are incompatible. A helper function called `monad_event_ring_check_content_type` is used to check that an event ring file has both the expected content type, and the expected schema hash for that content type. Here is an example of it being called in the `eventwatch.c` sample program: ```c theme={null} struct monad_event_ring exec_ring; /* initialization of `exec_ring` not shown */ if (monad_event_ring_check_content_type( &exec_ring, MONAD_EVENT_CONTENT_TYPE_EXEC, g_monad_exec_event_schema_hash) != 0) { errx(EX_SOFTWARE, "event library error -- %s", monad_event_ring_get_last_error()); } ``` If the type of event ring is not `MONAD_EVENT_CONTENT_TYPE_EXEC` or if the `schema_hash` in the file header does not match the value contained in the global array `uint8_t g_monad_exec_event_schema_hash[32]`, this function will return the `errno(3)` domain code `EPROTO`. ## Event descriptors in detail ### Binary format The event descriptor is defined this way: ```c theme={null} struct monad_event_descriptor { alignas(64) uint64_t seqno; ///< Sequence number, for gap/liveness check uint16_t event_type; ///< What kind of event this is uint16_t : 16; ///< Unused tail padding uint32_t payload_size; ///< Size of event payload uint64_t record_epoch_nanos; ///< Time event was recorded uint64_t payload_buf_offset; ///< Unwrapped offset of payload in p. buf uint64_t content_ext[4]; ///< Extensions for particular content types }; ``` ### Flow tags: the `content_ext` fields in execution event rings For each content type, we may want to publish additional data directly in the event descriptor, e.g., if that data is common to every payload type or if it would help the reader quickly filter out events they are not interested in, without needing to examine the event payload. This additional data is stored in the `content_ext` ("content extensions") array, and its meaning is defined by the `content_type`. For execution event rings, the first three values of the `content_ext` array are sometimes filled in. The value at each index in the array has the semantic meaning described by the following enumeration type, which is defined in `exec_event_ctypes.h`: ```c theme={null} /// Stored in event descriptor's `content_ext` array to tag the /// block & transaction context of event enum monad_exec_flow_type : uint8_t { MONAD_FLOW_BLOCK_SEQNO = 0, MONAD_FLOW_TXN_ID = 1, MONAD_FLOW_ACCOUNT_INDEX = 2, }; ``` For example, if we have an event descriptor ```c theme={null} struct monad_event_descriptor event; ``` And its contents are initialized by a call to `monad_event_iterator_try_next`, then `event.content_ext[MONAD_FLOW_TXN_ID]` will contains the "transaction ID" for that event. The transaction ID is equal to the transaction index plus one, and it is zero if the event has no associated transaction (e.g., the start of a new block). The idea behind the "flow" tags is that they tag events with the context they belong to. For example, when a transaction accesses a particular account storage key, a `STORAGE_ACCESS` event is emitted. By looking at the `content_ext` array for the `STORAGE_ACCESS` event descriptor, the reader can tell it is a storage access made (1) by the transaction with index `event.content_ext[MONAD_FLOW_TXN_ID] - 1` and (2) to the account with index `event.content_ext[MONAD_FLOW_ACCOUNT_INDEX]` (this index is related to an earlier `ACCOUNT_ACCESS` event series that will have already been seen). Flow tags are used for two reasons: 1. **Fast filtering** - if we are processing 10,000 transactions per second, and there are at least a dozen events per transaction, then we only have about 10 microseconds to process each event or we'll eventually fall behind and gap. At timescales like these, even touching the memory containing the event payload is expensive, on a relative basis. The event payload lives on a different cache line -- one that is not yet warm in the reader's CPU -- and the cache line ownership must first be changed in the cache coherence protocol (because it was recently exclusively owned by the writer, and now must be shared with the reading CPU, causing cross-core bus traffic). For most applications, the user can identify transactions IDs they are interested in at the time of the `TXN_HEADER_START` event, and then any event without an interesting ID can be ignored. Because the IDs are a dense set of integers, a simple array of type `bool[TXN_COUNT + 1]` can be used to efficiently look up whether subsequent events associated with that transaction are interesting (this can be made even more efficient using a single bit instead of a full `bool` per transaction) 2. **Compression** - the account of a `STORAGE_ACCESS` is referred to by an index (which refers to an earlier `ACCOUNT_ACCESS` event) because an account address is 20 bytes: large enough that it cannot fit in the two remaining `content_ext` array slots The compression technique is also used for storing the block associated with an event, in `event.content_ext[MONAD_FLOW_BLOCK_SEQNO]`. The flow tag in this case is the *sequence number* of the `BLOCK_START` event that started the associated block. A few things to note about this flow tag: * Sometimes it is zero (an invalid sequence number), which means the event is not associated with any block; although most events are scoped to a block, the consensus state change events (`BLOCK_QC`, `BLOCK_FINALIZED`, and `BLOCK_VERIFIED`) do not occur inside a block * Note that the block flow tag is *not* the block number. This is because at the time events are seen, the blocks are in the "proposed" state, and the consensus algorithm has not finished voting on whether or not the block will be included in the canonical blockchain (this is discussed extensively in the next section). Until a block becomes finalized, the only unambiguous way to refer to it is by its unique ID, which is 32-byte hash value (which can be read from the `BLOCK_START` payload); thus the block flow tag is also a form of compression * Having the sequence number allows us to rewind the iterator to the start of the block, if we start observing the event sequence in the middle of a block (e.g., if the reader starts up after the execution daemon). An example of this (and a detailed explanation of it) can be found in the `eventwatch.c` example program, in the `find_initial_iteration_point` function