4 minute read

Overview

This post will capture my notes from a presentation by the Ethereum Engineering Group. Peter Robinson and David Hyland-Wood present a majority of the content of this post. After listening to this lecture, my goal is not to become an expert on Solidity opcode. My goal is to get decent exposure to what is happening under the hood. As I continue my journey, the concepts here can provide helpful insight.

Compiler

  • Bin - Output for the byte code

  • ABI - Interface file which tells applications how to interact with your code.

  • Vyper is another EVM language.

  • The ABI and bin can be used with a wrapper generator (brownie).

Deployment Architecture

  • A web3 library is used to interact with the Eutheruem network.

  • Solidity runs on an Etheruem client; the client is an instance of the software running on a node.

Ethereum Transactions

  • When sending a simple transaction, the data field is usually empty but doesn’t have to be.

  • If you are deploying a contract, the to field must be empty.

Stack, Memory, Storage, Code, CallData, Logs

  • EVM is a stack-based processor.

  • EVM can access and store information in six places.

    • Stack - EVM Opcodes pop information from and push data onto the stack.

      • The stack has a maximum depth of 1024 words. One word is 32 bytes.
    • CallData - The data field of a transaction. These are parameters to the call.

    • Memory - An information store accessible for the duration of a transaction.

    • Storage - A persistent data store.

    • Code - Execution code and static data storage.

    • Logs - Write-only logger / event output.

OpCodes in the Ethereum Yellow Paper

  • OpCodes represent tasks and operations.

Contract Deployment, Constructors & Init Code Fragments

  • Init code - The init function includes code to deploy the contract plus the constructor to set up the contract state.

  • The bin contains the “init code fragment” and the “code to be deployed to the blockchain.”

Constructor

Contracts are deployed using transactions where:

  • To address is not specified

  • Data is the init code fragment. This includes the contract binary. This is the compiler output in the *.bin file.

Note:

  • The init function is not stored on the blockchain.

  • The data of the transaction is treated as code for contract deployment.

OpCode

  • MSTORE - Store a word in memory

  • CALLVALUE - Push how much Wei was sent with the transaction onto the stack. That is the VALUE transaction field.

  • DUP1 - Duplicate the top of the stack.

    • DUP1 - DUP31: This will duplicate different levels of the stack.
  • ISZERO - Pops a word off of the stack. If the word is zero, push 1 onto the stack, otherwise push 0.

  • PUSH2 - Push two bytes onto the stack.

  • JUMPI - Set the program counter (PC) to stack[0] if the stack[1] is not zero. Pop two values off the stack.

  • REVERT - Halt execution and indicates a REVERT has occurred. Use stack[0] as a memory location and stack[1] as a length of Revert Reason.

  • JUMPDEST - If the jump to Program Counter 0x10 was taken it would arrive here. The JUMPDEST opcode indicates valid jump destinations.

  • POP - Pop the top value off of the stack.

  • SSTORE - Store a word to storage.

  • COPYCODE - Copy from code that is executing to memory.

  • RETURN - End execution, return a result and indicate successful execution.

  • INVALID - Invalid operation marks the end of the init code.

  • CALLDATASIZE - Push the size of the transaction data field onto the stack.

  • LT - Less than

  • CALLDATALOAD - Push the 32 bytes (of CALLDATA) onto the stack at offset stack[0].

  • SHR - Shift stack[1] to the right stack[0] times, pop stack[0] off the stack.

Init Code Summary

For the specific example in the video.

  • Set up Free Memory Pointer (which wasn’t used).

  • Cause a REVERT if Wei was sent with the transaction (if the constructor is not payable).

  • Set up storage location with initial non-zero values.

  • Return a pointer to the contract code and length of code to be stored on the blockchain.

Function Calls

  • public variables automatically have a getter function created.

  • Common code can be used between 2 functions and 2 calls:

uint256 public val1
uint256 public val2
  • First, the code will allocate a return address on the stack to be utilized later.

  • The code will then continue with its logic.

  • Possible to have a fallback function. In this example, the code has a payable fallback function.

    • The nonpayable functions will have to check transaction values. Therefore a fallback function can “create” more opcode.

Storage

  • Set up as 32-byte words.

  • Variables smaller than 32 bytes will be stored in the same word.

  • If your value is less than 32-bytes in a specific location, the leading values are masked.

    • The code is not aware that you are storing less than 32 bytes.
  • If you use the full 32-bytes, you “create” less opcode.

    • Some opcodes like PUSH use less gas than other opcodes like SSTORE.
  • If you store multiple values in the same word, you will be “creating” more opcode.

  • Well laid out Solidity storage variables will result in fewer storage locations.

  • Fewer storage locations will save you money.

Bytes and Strings

  • Bytes and strings are variable-length arrays of bytes that behave the same way.

  • The bytes keyword is allocated a storage slot.

  • If your bytes variable holds more than 32 bytes, the marker bit indicates the length of your variable in bytes.

  • In Solidity, all parameters are passed in as 32-byte words.

Arrays

  • Values of an array are stored at locations: storage[keccak256(storage slot number) + key] = value

    • The number of elements in the dynamic array is stored at storage[storage slot number]

Mappings

  • Values for (mapping(uint256 => uint256) private map;) are stored at locations: storage[keccak256(key . storage slot number)] = value

  • . is concactenation.

  • Nothing is stored at storage[storage slot number]

  • Each time you use a memory location, it cost gas.

  • Requesting a lot of memory will cause an out-of-gas error. This protects against malicious attacks.

CODECOPY and EXTCODECOPY

  • Each account storage is entirely separate. If various contracts are malicious, they cant affect each other’s storage.

  • Code Copy - Copy bytes from this contract to memory.

  • Ext Code Copy - Copy bytes from another contract to memory.

  • Usages:

    • Revert Reason error messages

    • Deploy a contract from this contract.

    • Any other static data.

Aux Data / Solidity Metadata

  • Bytes included at the end of modern Solidity contracts contains metadata.

  • The metadata is CBOR encoded.

  • Indicates:

    • Swarm/IPFS message digest, which is the location of the source code.

    • Compiler name

    • Compiler version

  • Allows tools like Etherscan to automatically verify that the deployed contract was compiled from specific source code.

Summary

The EVM is a stack-based processor that has access to:

  • CallData - Transaction parameters.

  • Memory - Temporary data stored within a transaction.

  • Storage - Persistent storage that is part of the world state.

  • Code - Stores code and static data such as strings.

  • Logs - Write-only event log output.

Leave a comment

Your email address will not be published. Required fields are marked *

Loading...