Announcing bytecode-generated storage layouts on evm.storage

Published in

smlXL

4 min readOct 24, 2023

We’re improving visibility into unverified contracts on evm.storage by representing them with storage layouts generated from contract bytecode. We’re also open-sourcing the code for our Storage Layout Extractor.

As a company, we aim to make public computers (blockchains) more transparent, useful, and accessible. You can’t understand a machine without understanding its moving parts. For Ethereum, these are its smart contracts.

We can’t fully understand what a contract really does without inspecting its state, i.e., its storage slot. With evm.storage we’ve made strides in representing storage slots and their occupying variables for verified contracts. However, our understanding will never be complete if it’s limited to verified contracts. The industry has yet to solve the problem of examining and making accessible the state of unverified contracts.

While we’ve tracked over 59MM contract deployments on Ethereum, we’re only able to present ~5MM of these with verified source code-based layouts. Of those 5MM, we see only about 500K distinct code hashes. A tremendous amount of blockchain activity runs through unverified contracts.

Today, we are excited to announce that evm.storage now provides bytecode-generated layouts for unverified Solidity contracts. This allows anyone to view and track the storage of these previously opaque contracts.

We are also open-sourcing the code (under AGPL license) for a key component: our Storage Layout Extractor tool. The tool symbolically executes a contract’s bytecode to infer its storage layout. While we’re early in the tool’s lifecycle, we are quite happy with the initial results. In preliminary benchmarks, we were able to compute layouts for more than 120k contracts in under 110ms on average. At times the resulting layouts are even more accurate than those generated by solc with source code! You can read more about this in our technical deep dive.

When users land on an unverified contract in evm.storage, they will be presented with the bytecode-generated layout of that contract. It will look and feel similar to the experience for verified contracts, only without variable labels–these can’t be inferred from bytecode, and while we’re creative souls, we figured we shouldn’t make them up! A good example is Opensea’s Openstore contract, which is unverified, you can see the generated layout and storage history for mappings/variables. Raw layouts are also still available.

evm.storage — Opensea Shared Storefront example

One evm.storage feature we love is the ability to see the historical state of slots, variables, mapping keys, etc. This is also now supported for unverified solidity contracts, but works a bit differently. You’ll be able to immediately browse a bytecode-generated layout (unless we have a bug… and, if so, please report it!). But we’ll only add this contract to our storage history processing queue upon the first user visit. The data will be made available gradually as data for this contract starts streaming from our infrastructure. While the data is backfilling, you will see a banner at the top of the contract page. This “on-demand storage tracking” setup metrics helps us prioritize unverified contracts that our users are interested in (or a ploy to make you visit our site twice, thereby further inflating our sky-high usage.) We track all verified contracts by default.

The backstory and details

Given our mission, we have to support unverified contracts. At the same time, we believe that a vast majority of our users are currently interested in interacting primarily with verified Solidity contracts. So, it both had to be done and was hard to prioritize. We feel similarly about Vyper/Huff/ASM, for what it’s worth (and we will get it done). Early on, we scoped out the work required to support unverified contracts, but it lingered in our backlog.

Having experience in writing analysis tools that depend on the contract’s bytecode rather than its source code, we initially proposed a static analysis-based solution, including (roughly):

Generating a Control Flow Graph (CFG) from the contract’s bytecode.
Scanning the basic blocks for storage-related opcodes (SLOAD/SSTORE).
Tracing forward and backward from the storage operations to find out the subject storage slot and its data type.

As time went by, it remained hard to justify taking on this work given the opportunity cost. We also anticipated this work would have a lot of “gotchas” and edge cases. So we started looking at ways we could tackle this while minimizing the impact on our top priorities. We kept our eyes open for the right people to take this on as a project, only to stumble upon Marcin and Ara from Reilabs. They both had experience with compilers and VMs and were into crypto. It seemed like a great match, so we contacted Reilabs and discussed embedding Ara or Marcin in a small “tiger-team.” They were excited, and Ara had availability so we got to work, picking up from where we left off. After a few conversations with Ara, we decided that it would be beneficial to take an even more ambitious approach:

Disassemble the contract’s bytecode to a stream of EVM instructions (opcodes).
Symbolically execute the above stream on a custom EVM implementation that explores all reachable execution paths, i.e., take both branches in a JUMPI instruction.
For each value observed throughout execution, have the VM build a symbolic value that represents the operations performed to that particular piece of “data.”
Pass these symbolic values (trees, really) through a type-inference process and finally build a storage layout from the results.

While this approach might not be suitable for other architectures that make use of registers and execute huge programs, we were confident it would work fairly well for a special architecture like the EVM as it is a simple stack machine, and the smart contracts that it executes have a strict size limit due to the network rules.

To read more about our benchmarks and technical details, please check out our technical deep-dive post and the repo, and please join us in our TG group if you have any questions.

Announcing bytecode-generated storage layouts on evm.storage

Written by Dor