Notes on the EVM

4 minute read

Overview

This post will capture my notes from a presentation by the Ethereum Engineering Group. Peter Robinson and David Hyland-Wood present a majority of the content of this post. After listening to this lecture, my goal is not to become an expert on Solidity opcode. My goal is to get decent exposure to what is happening under the hood. As I continue my journey, the concepts here can provide helpful insight.

Compiler

Bin - Output for the byte code
ABI - Interface file which tells applications how to interact with your code.
Vyper is another EVM language.
The ABI and bin can be used with a wrapper generator (brownie).

Deployment Architecture

A web3 library is used to interact with the Eutheruem network.
Solidity runs on an Etheruem client; the client is an instance of the software running on a node.

Ethereum Transactions

When sending a simple transaction, the data field is usually empty but doesn’t have to be.
If you are deploying a contract, the to field must be empty.

Stack, Memory, Storage, Code, CallData, Logs

EVM is a stack-based processor.
EVM can access and store information in six places.
- Stack - EVM Opcodes pop information from and push data onto the stack.
  - The stack has a maximum depth of 1024 words. One word is 32 bytes.
- CallData - The data field of a transaction. These are parameters to the call.
- Memory - An information store accessible for the duration of a transaction.
- Storage - A persistent data store.
- Code - Execution code and static data storage.
- Logs - Write-only logger / event output.

OpCodes in the Ethereum Yellow Paper

OpCodes represent tasks and operations.

Contract Deployment, Constructors & Init Code Fragments

Init code - The init function includes code to deploy the contract plus the constructor to set up the contract state.
The bin contains the “init code fragment” and the “code to be deployed to the blockchain.”

Constructor

Contracts are deployed using transactions where:

To address is not specified
Data is the init code fragment. This includes the contract binary. This is the compiler output in the *.bin file.

Note:

The init function is not stored on the blockchain.
The data of the transaction is treated as code for contract deployment.

OpCode

MSTORE - Store a word in memory
CALLVALUE - Push how much Wei was sent with the transaction onto the stack. That is the VALUE transaction field.
DUP1 - Duplicate the top of the stack.
- DUP1 - DUP31: This will duplicate different levels of the stack.
ISZERO - Pops a word off of the stack. If the word is zero, push 1 onto the stack, otherwise push 0.
PUSH2 - Push two bytes onto the stack.
JUMPI - Set the program counter (PC) to stack[0] if the stack[1] is not zero. Pop two values off the stack.
REVERT - Halt execution and indicates a REVERT has occurred. Use stack[0] as a memory location and stack[1] as a length of Revert Reason.
JUMPDEST - If the jump to Program Counter 0x10 was taken it would arrive here. The JUMPDEST opcode indicates valid jump destinations.
POP - Pop the top value off of the stack.
SSTORE - Store a word to storage.
COPYCODE - Copy from code that is executing to memory.
RETURN - End execution, return a result and indicate successful execution.
INVALID - Invalid operation marks the end of the init code.
CALLDATASIZE - Push the size of the transaction data field onto the stack.
LT - Less than
CALLDATALOAD - Push the 32 bytes (of CALLDATA) onto the stack at offset stack[0].
SHR - Shift stack[1] to the right stack[0] times, pop stack[0] off the stack.

Init Code Summary

For the specific example in the video.

Set up Free Memory Pointer (which wasn’t used).
Cause a REVERT if Wei was sent with the transaction (if the constructor is not payable).
Set up storage location with initial non-zero values.
Return a pointer to the contract code and length of code to be stored on the blockchain.

Function Calls

public variables automatically have a getter function created.
Common code can be used between 2 functions and 2 calls:

uint256 public val1
uint256 public val2

First, the code will allocate a return address on the stack to be utilized later.
The code will then continue with its logic.
Possible to have a fallback function. In this example, the code has a payable fallback function.
- The nonpayable functions will have to check transaction values. Therefore a fallback function can “create” more opcode.

Storage

Set up as 32-byte words.
Variables smaller than 32 bytes will be stored in the same word.
If your value is less than 32-bytes in a specific location, the leading values are masked.
- The code is not aware that you are storing less than 32 bytes.
If you use the full 32-bytes, you “create” less opcode.
- Some opcodes like PUSH use less gas than other opcodes like SSTORE.
If you store multiple values in the same word, you will be “creating” more opcode.
Well laid out Solidity storage variables will result in fewer storage locations.
Fewer storage locations will save you money.

Bytes and Strings

Bytes and strings are variable-length arrays of bytes that behave the same way.
The bytes keyword is allocated a storage slot.
If your bytes variable holds more than 32 bytes, the marker bit indicates the length of your variable in bytes.
In Solidity, all parameters are passed in as 32-byte words.

Arrays

Values of an array are stored at locations: storage[keccak256(storage slot number) + key] = value
- The number of elements in the dynamic array is stored at storage[storage slot number]

Mappings

Values for (mapping(uint256 => uint256) private map;) are stored at locations: storage[keccak256(key . storage slot number)] = value
. is concactenation.
Nothing is stored at storage[storage slot number]
Each time you use a memory location, it cost gas.
Requesting a lot of memory will cause an out-of-gas error. This protects against malicious attacks.

`CODECOPY` and `EXTCODECOPY`

Each account storage is entirely separate. If various contracts are malicious, they cant affect each other’s storage.
Code Copy - Copy bytes from this contract to memory.
Ext Code Copy - Copy bytes from another contract to memory.
Usages:
- Revert Reason error messages
- Deploy a contract from this contract.
- Any other static data.

Aux Data / Solidity Metadata

Bytes included at the end of modern Solidity contracts contains metadata.
The metadata is CBOR encoded.
Indicates:
- Swarm/IPFS message digest, which is the location of the source code.
- Compiler name
- Compiler version
Allows tools like Etherscan to automatically verify that the deployed contract was compiled from specific source code.

Summary

The EVM is a stack-based processor that has access to:

CallData - Transaction parameters.
Memory - Temporary data stored within a transaction.
Storage - Persistent storage that is part of the world state.
Code - Stores code and static data such as strings.
Logs - Write-only event log output.

Share on

Twitter Facebook LinkedIn

Abdul Q Rabbani

Notes on the EVM

Overview

Compiler

Deployment Architecture

Ethereum Transactions

Stack, Memory, Storage, Code, CallData, Logs

OpCodes in the Ethereum Yellow Paper

Contract Deployment, Constructors & Init Code Fragments

Constructor

OpCode

Init Code Summary

Function Calls

Storage

Bytes and Strings

Arrays

Mappings

`CODECOPY` and `EXTCODECOPY`

Aux Data / Solidity Metadata

Summary

Share on

You may also enjoy

Reflecting on 2024

Technical Due Diligence

Web Summit 2022

Crypto Starter Guide

Abdul Q Rabbani

Overview

Compiler

Deployment Architecture

Ethereum Transactions

Stack, Memory, Storage, Code, CallData, Logs

OpCodes in the Ethereum Yellow Paper

Contract Deployment, Constructors & Init Code Fragments

Constructor

OpCode

Init Code Summary

Function Calls

Storage

Bytes and Strings

Arrays

Mappings

CODECOPY and EXTCODECOPY

Aux Data / Solidity Metadata

Summary

Share on

You may also enjoy

Reflecting on 2024

Technical Due Diligence

Web Summit 2022

Crypto Starter Guide

`CODECOPY` and `EXTCODECOPY`