PUSH1: or "Parsing EVM Bytecode"

December 28th, 2022

Have you ever encountered a JUMPDEST opcode in Ethereum bytecode and wondered how the execution can jump there? Look at this delicious 5Bs, so many places the code can run from!

If there is a 5B you can always jump there, right?…

Wrong!

And the answer has to do with something called instruction boundaries.

What are instruction boundaries and why do they matter?

When the Ethereum Virtual Machine (EVM) processes bytecode, it loads each octet and defines where one instruction starts and ends. These are known as instruction boundaries. Most opcodes are one-byte, with the exception of the PUSH opcodes. There are 32 PUSH opcodes, ranging from PUSH1 to PUSH32, and they all take more than one byte in bytecode for a single instruction at the program counter. When the EVM encounters anything between 60-7F, it needs to determine which PUSH it is and the size of the accompanying payload, then load everything at once as a single instruction and skip to the next bytecode byte.

But what does this has to do with JUMPDESTs? 5B is a 5B in the end! You see it in code - you jump!

Not really. If you’ve seen the Ethereum Yellow Paper, there’s a separate section there about the Validity of Jump Destinations (9.4.3):

And it explicitly states that "all [JUMP] positions must be on valid instruction boundaries, rather than sitting in the data portion of PUSH operations."

In other words, you cannot jump to a byte in the middle of a PUSH instruction.

For example, consider the following bytecode:

5B600055

At first glance, it may seem like this bytecode consists of a JUMPDEST opcode followed by a PUSH1 00 opcode and an SSTORE opcode, which could potentially write something to storage. However, if we take a broader look in the context of it, the JUMPDEST opcode is actually in the middle of a PUSH instruction:

So it’s not really a JUMPDEST, but a “5b600055” number pushed into stack. Therefore, the JUMPDEST opcode is not on an instruction boundary and cannot be jumped to.

“And what about all those sweet 5Bs I saw in the beginning?”

That code was from SeaPort and actually it’s a string that contains 5B, that is pushed into the stack with PUSH32:

And the string is “ConsiderationItem[]” or whatever that means:

Consider a ConsiderationItem[] consideratio

So the next time you see a JUMPDEST opcode in the middle of a PUSH instruction, you'll know you can't jump there.

And if you want to read and translate Ethereum bytecode into something more readable, just remember to process it sequentially and take PUSH instructions and their data into account.

Actually, it’s very easy, here’s a sample TypeScript code that does exactly that:

https://github.com/comitylabs/evm.codes/blob/main/context/ethereumContext.tsx

And here’s a bonus cherry on the top - a fully RegExp disassembler for EVM bytecode:

The regex strings are linked in the tweet reply - take a look! It only works in the PCRE2 flavor of regex (Perl family), so you may need to apply some tricks if your language doesn't support it.

So, in the end, we are safe and no jumps to the middle of revert strings are possible. You can breath out and make yourself a favorite hot drink.

If you like the stuff I write - subscribe, collect, follow me on Twitter and spread the word!

Subscribe to Convergence Boy

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

FUuZAaCR59uJYGf…n7tz2zEhFkfBGpU

Author Address

0x79635b386B9bd66…C32E6dd181C853F

Content Digest

zJX21EV6bjrPcL_…-ZscbL7S7inroro