# Exploring zkVM Design Trade-offs

## 1. **Introduction: Design, Balance, Trade-offs**

In our previous blog, we delved into the fundamentals of zero-knowledge proofs (ZKPs) and zero-knowledge virtual machines (zkVMs), offering a comprehensive look at the zkVM process.Now, let's focus on the intricate design choices and trade-offs that shape a zkVM.

The quest for optimal zkVM design hinges on achieving a delicate balance. This isn't just about core performance metrics—correctness, security, zero-knowledge, trust assumptions, efficiency, speed, and succinctness. It also necessitates careful consideration of developer experience, broader adoption potential, and overall system complexity. Recognizing the inherent trade-offs within this intricate landscape is crucial. This article aims to share valuable insights gleaned from our research and design journey. We hope to spark discussions on optimizing both zk-VM performance and usability, ultimately paving the way for wider adoption and accelerating innovation in verifiable computing.

## 2. Demystifying zk-VM Design: A Bird's Eye View

Before delving into the intricacies of design trade-offs, let's explore the fundamental design aspects of zk-VMs and their interconnected roles within the zk-VM system and verifiable computation process.

**ISA ↔ VM**

An Instruction Set Architecture (ISA) is a model of computation implemented by a physical computer or a virtual machine. It dictates the format of machine code generated by the compiler and executed by the VM, including the VM's mechanics, such as register layout and memory model.

### Programming Language Support ↔ Compiler

Selecting the programming languages to support determines which high-level languages can be compiled into machine code.

**Arithmetization ↔ VM, Prover, Verifier**

**Arithmetization is the process of converting computational problems into algebraic problems. This involves: **

i) Constraint System: A system that encodes VM execution correctness via polynomial constraints.

ii) Fields: Algebraic structures used to represent and manipulate data in arithmetization.

### Proof System ↔ Prover, Verifier

The proof system is the cryptographic protocol for producing and verifying succinct zero-knowledge proofs that the prover ‘knows’ a solution to the algebraic problem defined by the arithmetization. It is built from two major components:

i) Interactive Oracle Proof (IOP): A process where a prover convinces a verifier of a fact's truth through interactions involving an oracle. The prover sends polynomials to the oracle, and the verifier queries the oracle for evaluations at specific points, with the oracle providing honest answers.

ii) Polynomial Commitment Scheme (PCS): A cryptographic protocol for committing to polynomials and later proving their evaluations at specific points.

- Modular vs. Monolithic Architecture ↔ Entire zk-VM System

The design of a zk-VM can be either modular, built from pluggable components, or monolithic, a closed integrated solution.

The following sections dissect the motivations behind key design choices for the aforementioned aspects.

## 3. What Instruction Set Architecture should a zkVM use?

An ISA is a crucial component of both physical computers and VMs, defining the model of computation they implement.

### Key Questions an ISA Addresses

- Memory Model: Defines how many address spaces exist, their sizes, and their functionalities. For example:

- Von Neumann Architecture: Single address space for reading, writing, and executing
- Harvard Architecture: Separate address spaces for data (readable and writable) and code (executable)
- 32-bit vs. 64-bit ISAs: 32-bit addresses allow for 2^32 distinct addresses, while 64-bit addresses allow for 2^64 distinct addresses.

- Instruction Set: Specifies the primitive operations the model defines, forming the building blocks for all programs.
- Registers: Details the number of registers, the types of data they can hold, and their purposes.
- Calling Convention: Outlines the protocol for function calls, including argument passing, return values, and register management.

ISAs used for zk-VMs can be broadly classified as general-purpose ISAs and zk-optimized ISAs.

*Instruction Complexity*

ISAs that support more instructions, or instructions that do more work, tend to result in higher per-instruction complexity for proving and verification. However, they also reduce the number of instructions required to solve a problem.

*Net Effects on Costs*

The net effects of ISA complexity on proving and verification costs are not straightforward. Overall costs for a particular computation can be quantified as: * Cost per instruction x number of instructions.*

Increasing ISA complexity tends to increase the cost per instruction while reducing the number of instructions required. Adding or removing complexity can be either beneficial or detrimental, depending on the specific case.

Overall, the choice of ISA impacts the efficiency, speed, and succinctness of a zk-VM, as well as the complexity of instructions. ISAs with more complex instructions reduce the number of instructions needed but increase per-instruction complexity. zk-optimized ISAs offer potential performance benefits tailored to zero-knowledge proofs, while general-purpose ISAs leverage existing compiler technology and optimizations, reducing development complexity and time to market.

Valida uses a custom ISA optimized for high-performance zk-proving, unlike standard ISAs designed for electronic computers. The tradeoff of using a custom ISA is that it requires Lita to build our own compiler toolchain, enabling developers to write code in familiar languages like C and generate equivalent Valida ISA machine code. This is challenging work, but we believe a new domain-specific ISA is essential to facilitate a paradigm shift towards trustless systems. This commitment drives our effort, despite the challenges.

We will discuss Lita’s custom ISA in greater depth in a future article.

## 4. Which Programming Languages Should a zkVM Support?

Programming languages for zk-VMs can be broadly categorized into two dimensions: level of abstraction and domain specificity.

*Level of Abstraction*

- Low-Level Languages: These present a low abstraction model of computation, requiring programmers to detail how a computation is to be performed, relative to the primitives supported by the target computing environment.
- High-Level Languages: These offer a higher abstraction model, allowing programmers to specify the desired outcome without specifying exactly how the computation is to be performed.

*Domain Specificity*

- General-Purpose Languages: Designed for a wide range of applications, these languages are not tailored to any specific domain.
- Domain-Specific Languages: These are crafted for specific application domains, optimizing features and functionalities for targeted use cases.

*Advantages of Tying to a Domain-Specific Language*

Some zk-VM projects are tied to a domain specific language which is the only supported high level language for programming that zk-VM, providing advantages such as:

- Supporting one language with a minimal feature set simplifies implementation.
- Language features can be tailored to the zk-VM's capabilities, enhancing ease of implementation.
- The high-level language can be optimized for the most efficient zk-VM implementation.

Examples of such zk-VMs include: Cairo zk-VM (tied to the Cairo language), Lurk zk-VM (tied to the Lurk language) & Polygon zk-EVM (tied to Solidity).

*Advantages of Supporting Multiple General-Purpose High-Level Languages*

Alternatively, some zk-VM projects support multiple general-purpose high-level languages, offering benefits like:

- Developers can use feature-rich, familiar languages.
- Developers can leverage established compiler technology, reducing development costs and allowing sophisticated compiler optimizations without the need for re-implementation.

Notable examples of this approach include Valida, RISC Zero, and Succinct SP1.

Selecting the appropriate languages to support significantly influences both the ease of solving complex problems and the adoption rate among developers. In essence, it boils down to usability and accessibility. Ideally, a zk-VM would support every language, making ZKPs truly universal. However, achieving this ambitious goal requires a strategic approach.

These considerations are similar to general programming. For instance, Rust's strict memory model minimizes security vulnerabilities compared to C, but at the cost of increased complexity and potentially lower efficiency. Ultimately, the optimal choice hinges on the project's specific goals and the developer community's needs. Striking a balance between these factors is paramount to building a zk-VM that is both powerful and accessible.

**5. What Arithmetization Strategy Should a zkVM Use?**

Arithmetization is the process of encoding a computational statement - like the machine instructions of a VM - in terms of algebra. In zk-VMs, this process involves representation of execution trace and polynomial constraints which express the correctness of execution.

Typically, facts in arithmetization are collections of elements from vector spaces over a ring, often a finite field. These facts can be expressed as polynomials in either coefficient form or point-evaluation form.

We will discuss arithmetization strategies focusing on two main aspects: constraint systems and fields.

In zk-VMs, the execution trace, which records the VM's state at each computational step, is transformed into a matrix or vector of finite field elements. The correctness of the execution is then encoded as polynomial equations or "constraints" among these elements, allowing computational problems to be addressed using algebraic techniques.

- Rank-1 Constraint System (R1CS)

R1CS (Rank-1 Constraint System) constraints can be automatically generated and optimized starting from a higher-level “circuit” specification, as seen in tools like Circom. However, the constraint specification can often be significantly larger than the witness, necessitating pre-processing to manage complexity effectively. This can lead to substantial complexity in the proof system, as evidenced by protocols like Groth16’s circuit-specific trusted setup or Spartan’s “sparse matrix commitments.”

- Algebraic Intermediate Representation (AIR)

AIR constraints, on the other hand, offer a uniform representation that allows the verifier to work directly with a succinct format, eliminating the need for pre-processing. Despite this advantage, AIR constraints require careful design, limiting the use of automated tools. Their uniformity also makes them less suitable for describing computations without repetitive structures typically found in VMs.

- PLONKish

PLONKish constraints represent a compromise, featuring a partially uniform structure. They utilize non-uniform “copy constraints” and “selectors” to express more general types of computations in a uniform manner. While these constraints require pre-processing, the process is simpler and less expensive compared to R1CS. Most modern AIR-based proof systems, including plonky3/Valida, incorporate these more general types of constraints to enhance their expressiveness.

Constraint systems like AIR, R1CS, and PLONKish are universal, meaning they can encode arbitrary computations. This flexibility makes them suitable for a wide range of applications.

However, the arithmetization process can also incorporate special statements that encode facts difficult or expensive to express as polynomial equations. For example, ensuring that one vector contains the same entries as another, possibly in a different order, can be a complex task. These special statements can be proved using specialized

IOPs as sub-protocols.Incorporating such specialized statements effectively can significantly contribute to the efficiency and complexity of a zk-VM.

A ring is a type of algebraic structure which features addition and multiplication operations that satisfy some basic laws of algebra. In a ring, addition is required to be commutative, associative, zero-canceling, and invertible. Multiplication in a ring is required to be associative, unital, and distributive over addition. A field, by definition, is a ring where multiplication is invertible, meaning that every non-zero element of a field has a multiplicative inverse. A finite field is a field where there are only finitely many distinct values.

Designing a ZK proving protocol typically requires selecting one or more rings or fields, often involving extension fields for additional security.

*Elliptic Curve-Based Protocols:* Elliptic curve-based ZK proving protocols are particularly influenced by the choice of the field of coefficients. Each elliptic curve is defined by a polynomial equation over a finite field, forming an elliptic curve group. The protocol selects a point from this group to generate a cyclic subgroup of prime order, usually isomorphic to a finite field Fp.

This setup allows for a homomorphism from the additive group of Fp to the elliptic curve group, facilitating zero-knowledge protocols using polynomials with coefficients from Fp. These protocols often require large fields (e.g., 255-bit) to ensure security, which introduces computational overhead but simplifies operations with large numbers. For smaller fields (e.g., 32-bit or 64-bit), extension fields are used to achieve the necessary security levels, as seen in protocols like Plonky2 and Plonky3.

*Recent Advances:* Recent advancements have enabled ZK proving protocols to operate over various types of fields and rings, enhancing flexibility and performance. For instance, new protocols can work over binary fields (e.g., Binius), arbitrary finite fields (e.g., Brakedown), and even rings that are not fields (e.g., Rinocchio). These developments allow for the selection of rings that best suit specific use cases. However, accommodating commonly used rings like 32-bit and 64-bit integer rings remains a challenge in current research.

Choosing the right constraint system is crucial for optimizing the efficiency and complexity of a zk-VM. Systems like R1CS, AIR, and PLONKish offer various benefits and trade-offs in terms of pre-processing requirements and suitability for different types of computations. The choice of fields (and rings) is a fundamental aspect of zk-VM design, directly impacting the efficiency and security of zero-knowledge proofs.

Generalizing outcomes of different arithmetization strategies in zk-VM designs is challenging due to their diversity. However, improvements in one metric often worsen another. For example, greater succinctness usually increases proving complexity. Small, secure, easy-to-verify proofs are hard to generate, while easier-to-generate proofs tend to be larger, harder to verify, or less secure. Therefore, there is no universally best strategy; use cases and priorities will always drive design choices.

## 6. What Proof System is Best for zkVMs?

The proof system in zk-VMs, typically a (zk)-SNARK or (zk)-STARK , is built from two major components:

- Interactive Oracle Proof (IOP): In this protocol, the prover represents columns of the witness as polynomials and argues that the witness satisfies the constraints by providing evaluations of these polynomials at "challenge points" randomly sampled by the verifier.

- Polynomial Commitment Scheme (PCS): This cryptographic protocol allows the prover to represent polynomials with succinct “commitments” and later prove that these committed polynomials evaluate to specific values at chosen query points.

It's important to note that "IOP" and "Polynomial IOP" are often used interchangeably. In zk-VMs, all IOPs involve oracles that are polynomials in a SNARK. In this context, all references to IOPs here are referring to PIOPs.

An IOP is an interactive protocol conducted between a prover and a verifier. Each round of the protocol consists of:

- The prover sends polynomial ‘oracles’ to the verifier. These oracles are abstract representations of a polynomial that the verifier can query. A query involves the verifier sending a point to the oracle, which responds with the evaluation of the polynomial at that point.

- The verifier randomly samples some ‘challenge’ values and sends these to the prover. These challenge values should be from a field large enough to provide cryptographic security (e.g., 100+ bits). The challenges include the points at which the verifier queries the oracles and other random values used in the argument.

This interactive process, involving multiple rounds of exchanges, ensures the prover’s assertions about the witness’s correctness are rigorously checked by the verifier.

An IOP and a PCS can be combined to build a zk-SNARK: a zero-knowledge succinct non-interactive argument of knowledge.

- Commitment Instead of Oracles: Instead of sending abstract oracles in the IOP, the prover uses the PCS to ‘commit’ to the polynomials and sends the verifier these succinct commitments.
- Use PCS to Construct Evaluation Proofs: When the verifier queries the polynomial oracle for its evaluation at a point, the prover uses the PCS to construct an evaluation proof and sends this to the verifier along with the evaluation.
- Fiat-Shamir Heuristic: To make the argument non-interactive, the prover applies the Fiat-Shamir heuristic. Instead of asking the verifier to sample random challenge values, the prover evaluates a cryptographic hash function on the commitments. This heuristic models the cryptographic hash function as a “random oracle,” meaning that the output of the hash function cannot be distinguished from a stream of random values dependent on the input.

IOPs are designed to prove specific constraint systems and often rely on particular features of those systems. They are compatible with certain finite fields and polynomial properties, such as being "FFT-friendly."

Various IOPs can be employed to prove a given constraint system, each offering different benefits and trade-offs. For each common constraint system, there are both univariate and multilinear IOPs known to prove them. The arithmetization is independent of these considerations; they aren't intrinsically multilinear or univariate.

IOPs are classified into univariate and multilinear types based on whether the polynomials involved have one or multiple variables.

- Univariate IOPs

Univariate polynomials of degree less than *N* correspond uniquely to trace vectors *t* of length *N* by requiring *t*[*i*] = *p*(*x _{i}*)where

*D*=

*x*

_{ii}_{ = 1}

*is a specified set of elements of the field 𝔽 (*

^{N}*D*is typically a multiplicative subgroup).

- Multilinear IOPs

Multilinear polynomials are polynomials in *n* variables with degree at most 1 in each variable. These correspond uniquely to trace vectors *t* of length *N* = 2* ^{n}* by requiring that

*t*[

*i*] =

*p*(

*ϵ*

_{1}, …,

*ϵ*)where [

_{n}*ϵ*

_{1}, …,

*ϵ*]is the binary representation of the integer

_{n}*i*.

The choice between univariate and multilinear IOPs dictates the polynomial commitment schemes that may be used. Historically, univariate IOPs have been more common, partly because the only known polynomial commitment schemes with constant-sized proofs are univariate. However, multilinear polynomial commitment schemes have made significant advances in recent years.

Divisibility Argument vs. Sumcheck-Based

To show that a polynomial equation holds at each point in a large set (i.e., for each row of the execution trace), univariate IOPs typically use a divisibility argument, while multilinear IOPs use the sumcheck protocol. Each method relies on distinct algebraic techniques that do not directly translate between settings, resulting in important performance differences.

- Sumcheck Protocol (Multilinear IOPs)

The sumcheck protocol is generally faster and more flexible. The prover only needs to perform a number of field operations, primarily computing linear combinations of vectors, that scale linearly with the witness size NN. This approach does not require sending additional oracles to the verifier apart from those encoding the witness itself.

- Divisibility Argument (Univariate IOPs)

On the other hand, divisibility arguments, used in univariate IOPs, typically involve computing FFTs (Fast Fourier Transforms), which require Nlog(N)Nlog(N) field operations. Additionally, the prover must send the verifier oracles for auxiliary polynomials, making the process more computationally intensive.

A commitment scheme is a cryptographic protocol where a prover creates a succinct fingerprint (commitment) of data and proves facts about it without revealing the data itself. A polynomial commitment scheme extends this to polynomials, allowing a prover to commit to polynomials and later prove their evaluations at specific points.

Polynomial commitment schemes can be broadly classified into several categories. While this classification is not exhaustive, it provides a comprehensive overview of the different types of PCSs.

- Coefficient Rings

PCSs operate with polynomials whose coefficients are drawn from specific rings, typically finite fields. The choice of coefficient ring impacts the complexity of polynomial constraint systems, with larger fields simplifying constraints but increasing computational overhead.

For instance, the Cairo zk-VM designs its source language and ISA to align with the most natural coefficient ring for the PCS.

- Univariate vs. Multilinear

Univariate PCS: Commits to univariate polynomials and cannot be used in multilinear protocols. Offers constant-sized proofs (e.g., KZG).

Multilinear PCS: Commits to multilinear polynomials and can be used in univariate protocols. Provides more efficient encoding and linear proving complexity.

- Singular vs. Batch

Singular PCS: Commits to a single polynomial and proves a single point evaluation.

Batch PCS: Commits to multiple polynomials and proves multiple point evaluations simultaneously, useful in complex SNARK protocols.

- Elliptic Curve Based vs. Hash Based

Elliptic Curve Based PCS: Involves commitment through a homomorphism from a vector space to an elliptic curve group, , as seen in schemes like KZG. The primary advantage is resulting in smaller proofs.

Hash Based PCS: Utilizes hashing (e.g., FRI-based PCS), reducing proving complexity and finite field operations. Generally, KZG PCSs have lower verification complexity than FRI-based PCSs, which in turn have lower verification complexity than inner product arguments.

Succinctness in zk-SNARKs involves three key metrics: proof size, verifier time complexity, and verifier space complexity, all crucial for verifying SNARK proofs on layer 1 blockchains. Layer 1 blockchains have stringent resource constraints due to their highly replicated computations, necessitating strict limits on transaction size and computation complexity.

Proof size, verifier time complexity, and verifier space complexity can be constant, sub-linear, or linear. For instance, KZG PCSs provide constant metrics but require a trusted setup. While KZG PCSs are ideal for succinctness, they demand significant setup, and PCSs with lower proving complexity often produce larger, harder-to-verify proofs. FRI-based PCSs, although less complex in proving, result in larger and more challenging proofs to verify compared to KZG PCSs.

The design and choice of IOPs in zk-VMs involve trade-offs in computational efficiency, proof size, and complexity, impacting the overall performance of the proof system. The choice between univariate and multilinear IOPs significantly affects the polynomial commitment schemes used. Univariate IOPs have traditionally been more common due to constant-sized proof schemes like KZG, while multilinear IOPs are gaining popularity for their efficient encoding and linear proving complexity. Univariate IOPs typically rely on divisibility arguments, which are computationally intensive, whereas multilinear IOPs use the sumcheck protocol, offering faster and more flexible operations with fewer requirements.

A multilinear hash-based PCS is currently ideal if the priority is low proving complexity, no need for a trusted setup process, and manageable proof sizes. Conversely, a KZG PCS is more suitable if the main concern is minimizing proof size and verification complexity, and a trusted setup process is acceptable. As research progresses and new advancements emerge, these evaluations will likely evolve, necessitating continuous reassessment and adaptation.

**7. Should a zkVM be Modular or Monolithic?**

The design of a zk-VM can be approached in two fundamental ways: modular or monolithic.

Modular zk-VMs offer extensive customization, allowing for the addition of instructions tailored to specific use cases, resulting in fewer instructions and less work for the prover. However, this flexibility comes at the cost of increased complexity and maintenance expenses. In contrast, monolithic zk-VMs are easier to manage and maintain due to their centralized, closed-ended architecture, but they offer less flexibility and are limited to the instructions in a centralized fashion.

The choice between modular and monolithic zk-VMs hinges on the project's goals and the required balance between customization and ease of management.

## 8. **Beyond Performance: Valida's Design Choices**

Having explored the intricacies of zk-VM design choices, let's delve into the core principles behind Valida's architecture and the rationale for our strategic decisions.

- Custom ISA for High-Performance zk-Proving

Valida uses a custom Instruction Set Architecture (ISA) optimized for high-performance zk-proving, unlike standard ISAs designed for electronic computers. In electronic processors, accessing main memory (RAM) is expensive, while register accesses are cheap. In Valida’s AIR arithmetization, accessing RAM is cheap but registers are expensive. Therefore, we eliminate most registers and place all local variables in main memory, greatly reducing the number of local variables and case-by-case logic in Valida’s CPU chip. Additionally, supporting many different instructions is costly for AIR but inexpensive for electronic processors, so the Valida ISA maintains a small and simple set of instructions. The trade-off of using a custom ISA is that it requires Lita to build our own compiler toolchain, allowing developers to write code in familiar languages like C and generate equivalent Valida ISA machine code.

- Extensive Programming Language Support

Valida leverages the LLVM project's Intermediate Representation (IR), enabling the compilation of mainstream programming languages for the Valida zk-VM. This allows us to utilize decades of optimization capabilities, benefiting all supported frontend languages. Initially, we chose C due to its widespread use and influence, providing low-level access to memory and efficient mapping to machine instructions. We are actively working on support for Rust. We are also considering WebAssembly (WASM), which supports more languages than LLVM and is essential for web applications.

- Advanced Constraint System & PCS

Constraint System: Valida uses an extension of AIR (Algebraic Intermediate Representation) that incorporates permutation arguments, lookups, and optional pre-processed data. The uniform, repetitive structure of VM execution makes AIR a great fit. Permutations and lookups enable Valida’s modular design, allowing separate traces to communicate. Pre-processed data is useful for static lookup tables like range checks, allowing the VM to run with arbitrary static data in memory.Polynomial Commitment Scheme (PCS): We use the univariate FRI (Fast Reed-Solomon IOP) PCS, known for its speed, where the prover’s dominant operation is computing hashes. This scheme benefits from long-term extensive use, resulting in highly optimized and performant libraries like Plonky3. While proof sizes are relatively large, they can be reduced through recursion, making them suitable for blockchain usage. We are also exploring powerful multilinear PCS constructions such as BaseFold and Binius, which offer even faster performance at the expense of larger proofs compared to univariate FRI.

- Optimal Field

We work with the 32-bit “baby bear” prime field, whose characteristic is very close to a power of 2, allowing for very fast computation in hardware. This field is “FFT-friendly,” containing a large power-of-two sized multiplicative subgroup, which is essential to support the univariate “STARK” IOP and FRI.

- Emphasis on Modularity

We prioritize modularity in Valida. The Valida argument is separated into several smaller arguments, each handling a specific type of computation, which are then combined through a permutation argument. This modular design is flexible and extensible, allowing us to prove only the computation type being executed at a particular step, rather than proving every type of computation at every step. This approach is a key driver of our performance.

By carefully crafting a custom ISA, supporting a diverse range of programming languages, and leveraging advanced constraint systems and polynomial commitment schemes, Valida achieves exceptional performance and efficiency in zk-proof generation.Furthermore, our commitment to a modular design and the selection of optimal fields empowers the zk-VM with unparalleled scalability. This paves the way for the development of robust and scalable zero-knowledge applications on Valida.

We invite you to join the conversation on zk-VM design and welcome your comments and ideas. If you found this helpful, visit our website and GitBook to learn more about what we are building at Lita. Additionally, we are excited to announce the alpha release of the Valida zkVM and C compiler, allowing you to try running, proving, and verifying programs on Valida.

________________________________________

Disclaimer:

The content of this blog, including text, graphics, and images, is protected by copyright © 2023 - 2024 Lita.Foundation. All rights reserved. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Lita.Foundation with appropriate and specific direction to the original content.

For any questions or permissions, please contact Lita Foundation.