0. Scope of this review
This review aims to scope out the issues in Valida which must be addressed for an MVP. This review is complementary to but not redundant with the October 2023 Plonky3 / Valida review. That review addressed an earlier version of the code which had some issues that are not present in the current version. Also, this review tries to go deeper into certain issues that were not fully addressed in the October 2023 review, while also not going as deep into certain issues that were addressed in the October 2023 review.
This review should not be considered a comprehensive internal audit of Valida. The goal is not to ensure that Valida is correct; this would require a deeper level of review than is undertaken here. Instead, the goal is to flush out critical issues required by the MVP which can be readily flushed out at this point via code review. The main focus of this review is checking the STARK implementation for soundness and completeness. Proving the STARK implementation sound and complete is not the goal, nor is that achieved in this review. This review does point out some issues with the current STARK implementation.
Soundness and completeness are the two basic correctness properties which all proving systems must satisfy. Soundness means basically that one cannot create a proof of a false statement (or the likelihood of doing so is negligible). Completeness means basically that the prover can create a proof of any statement for which one has a witness to the statement’s truth (or the probability that the prover cannot do this is negligible). Strictly speaking, soundness is a property of a verifier (it will not accept a proof of a false statement), while completeness is a property of a prover / verifier pair (for every true statement of the type covered, the prover can prove it given a witness, and the verifier accepts that proof).
Proofs of the soundness and completeness of the Valida prover / verifier pair would use as premises the soundness and completeness of the FRI polynomial commitment scheme (PCS) which the Valida prover / verifier pair is built on. Proofs of the soundness and completeness of the FRI PCS would in turn use as premises the soundness and completeness of the FRI low degree test (LDT) which the FRI PCS is built on. This review does not cover the FRI PCS or the FRI LDT. This is not because they are not worthy of further review, but because they are not in the scope of the present review, which focuses on the Valida code only and not the Plonky3 code.
Two other correctness properties we care about, in addition to soundness and completeness, are succinctness and zero knowledge. Succinctness means basically that the proofs are small enough and easy enough to verify. Zero knowledge means basically that having the proofs does not help to find the inputs used to generate the proofs, any more than knowing the facts that they prove helps to find those inputs.
Currently, Valida proofs are not zero knowledge. To make them zero knowledge, we have to introduce blinding factors at appropriate points.
Another correctness property we care about is functional completeness, which means roughly that the VM must be able to run the programs output by the compiler backend.
This review does not try to be the last word on the correctness properties of Valida. The soundness and completeness of Valida have never been proven, as far as I know. I don’t claim that fixing the open issues outlined in this review would result in Valida’s prover / verifier pair being sound and complete, or that it would result in Valida being functionally complete. There are no proofs cited or included in this review. Much more verification work remains to be done on Valida. This review is only a partial contribution towards the goal of verifying Valida and scoping out the remaining work to bring Valida up to spec.
1. Overview of Valida succinct arguments
The STARK argument used in Valida follows a common style of argumentation used in zero knowledge proofs. It commits to traces represented as polynomials. It commits to the quotient of a linear combination of all of the constraint polynomials over the zerofier polynomial of the evaluation domain used to generate the polynomials from the traces. The statement that this quotient exists implies with high probability that all of the constraint polynomials are zero at all of the rows, i.e., at all of the points in the evaluation domain where the traces live along the graphs of their polynomials.
The verifier succinctly verifies that the quotient exists by dealing with polynomial commitments sent by the verifier, along with openings of those polynomials at pseudo-random points selected using the Fiat-Shamir heuristic. These commitments and openings allow the verifier to succinctly verify that at a pseudo-randomly selected point, the equation P(x) = Q(x) * Z(x) holds, for some Q(x), where P(x) is some randomized linear combination of the constraint polynomials over the trace polynomials committed to in the proof.
Valida uses a collection of STARK arguments to prove a single program execution. Each STARK argument is of the form just described, succinctly proving / verifying equations of the form P(x) = Q(x) * Z(x), for P(x) a randomized linear combination of constraint polynomials. Each chip should have its own STARK argument (at least one, but possibly more than one).
Some chips need to interact with each other. Each chip has its own trace, and sometimes these traces need to be related to each other in certain ways for coherence to be achieved. For example, the memory chip trace needs to be consistent with itself (with every read resulting in the last value written to the same address), but the memory chip trace also needs to be consistent with the CPU chip trace (so that every LOAD32 and STORE32 instruction trace is consistent with the memory trace). The former (memory consistency) is a constraint on a single chip’s trace (the memory trace), and the latter (memory consistency with the CPU chip trace) is a constraint relating two chip traces (the memory trace and the CPU chip trace).
Cross-chip interactions are defined using an Interaction abstraction defined in the machine crate of the Valida code. Interactions are of four types: local send, local receive, global send, and global receive. Each interaction defines the columns it covers, the count column used to keep track of the interaction, and the index of the bus argument used to prove the interaction.
Each bus argument is either a local or a global bus argument. The difference between these is that a local bus argument is local to a single chip, whereas a global bus argument is non-local across multiple chips. It’s not clear to me what difference this makes in practice, since they are treated similarly in the prover / verifier pair. A local bus argument is usable as a form of lookup argument. A local bus argument occurs only in the memory chip.
All of the interactions between all of the chips on a machine are proven together in a single permutation argument. This permutation argument claims that a cumulative sum of terms is equal to zero. Each interaction of the permutation argument trace contributes to the cumulative sum the multiplicative inverse of the value of a randomized linear combination of the elements of the fields of that interaction. These randomized linear combinations are meant to be such that the terms in the cumulative sum for corresponding sends and receives cancel each other out. Hypothetically, this is unlikely to occur unless the sends and receives do in fact correspond one to one and are consistent with each other. This hypothesis would be a key lemma in a proof of soundness of the permutation argument.
2. Code review notes
The accompanying spreadsheet summarizes the constraints for the chips in the basic machine and the interactions between them.
The following versions of Valida and Plonky3 were referenced in this review:
- Valida: d5ece10968f4ca0c66cb77c4777fd711d322cbfd @ origin ([email protected]:valida-xyz/valida.git)
- Plonky3: 16b590da6d9b235efa67e69e7b67d6ed3b270589 @ origin ([email protected]:Plonky3/Plonky3.git)
3. Soundness and completeness
To address whether the Valida prover / verifier pair is sound and complete or not, the following issues need to be considered:
- Does the STARK argument used ensure (with sufficiently high probability) that the polynomial constraints input into it are true?
- Does the permutation argument used ensure that the requirements of the chip interactions specified by the argument are true?
- Do the polynomial constraints apart from the permutation argument, and the requirements of the chip interactions, suffice to ensure that the statement that’s supposed to be proven is true?
- Can the constraints always be satisfied, and a verifiable proof generated, for any valid trace input into the prover?
This review considered each of these issues. No problems were identified with the STARK argument itself or with the permutation argument; at the same time, the presence of bugs in these pieces was not ruled out.
There are no known cases where a valid trace cannot be proven valid. More extensive testing should be performed to help flush out any cases where a valid trace is not provable.
There are some known issues and possible issues with the polynomial constraints and interactions, which mean that the polynomial constraints of the chips and the requirements of the interactions do not in every case suffice to ensure that the trace is valid. Those issues are as follows:
- In the equality test in the CPU chip, one of the constraints states that not_equal = diff * diff_inv. Since not_equal is 0 or 1, this is satisfied only if diff_inv is the multiplicative inverse of diff, or diff is zero, or diff_inv is zero. If not_equal is 1, then diff_inv is the multiplicative inverse of diff, implying that diff is not zero, implying that the input values are not equal. What’s not clear is that if not_equal is 0, then diff is 0. It seems like it could be that diff is non-zero, and diff_inv is 0, and not_equal is 0. So, the argument is not sound. We can fix this by adding another constraint: (1 - not_equal) * diff = 0. This should complete the constraints for the zero test on diff. This issue may be replicated elsewhere where zero tests occur.
- The memory consistency argument appears questionable. This argument is supposed to check that every read produces the value last written to the same address. As best I can tell, the reasoning process behind the argument is roughly like this that follows.
Each row of the memory trace has a clock value indicating which row of the main trace it corresponds to. However, instead of being placed in the trace in ascending order of clock value, the rows are supposed to be sorted first by memory address and then by clock value.
It’s not clear that the memory address groups have to be in a particular order relative to each other. It’s not even clear that all of the trace rows corresponding to a particular memory address have to be sorted into a single group. However, every read needs to be preceded by a corresponding write, which means that every group of trace rows for a single memory address needs to begin with a write.
The memory argument states that each memory chip trace row contains a counter value, and these counter values increment sequentially up to the last row of the trace. It also states that each memory chip trace row contains a diff value, and this is equal to (next.clk - local.clk) for any row other than the last row of a grouping. The memory chip has an interaction with itself which sends the diff to the counter. This suggests to me that every diff needs to be a value which the counter takes at some point. That seems to ensure that the memory trace is at least as long as the largest difference in the clock value between two adjacent rows of the memory trace. This would be ensured, for example, if the memory trace is at least as long as the main CPU trace. For this to balance out, the counter_mult would have to be set, for each value of counter, to the count of the number of values of diff that are equal to it. I’m not sure exactly what the intention is behind this interaction of the memory chip with itself, however.
A possible flaw in the current memory argument is that it doesn’t seem to require the memory chip trace rows within a single memory address grouping to be in any particular order. It needs to require them to be in ascending order by clock value, but I don’t see anything that enforces that this is the case. If the memory chip trace rows within a single memory address grouping are out of order with respect to clock value, then read / write consistency is not guaranteed. However, it is possible as far as I know that this correct ordering by clock value is guaranteed and I am just not seeing how it is guaranteed. I have not proven either way that read / write consistency is or is not guaranteed.
The memory argument interactions use an is_real column, which seems to be used to distinguish between memory trace rows that are used and rows that are not. However, the is_real column is not explicitly referenced in the memory argument polynomial constraints. I’m not sure if this is an issue or not.
See Valida issue #40 for further discussion of what could be wrong with the current memory argument. The issue description refers to some range checks on the address deltas which I don’t see implemented in the code. The memory argument could make more sense with these range checks added, but even then it doesn’t seem that it would be correct, for the reasons given the in the issue description.
I think the memory argument can probably be fixed without changing the basic approach. The basic approach seems to be okay. What I would want to see is that the fix caused all memory chip trace rows in a single memory address grouping to be ordered sequentially in ascending order by clock value, without skipping over any intervening memory chip trace rows related to the same address.
Implementing such a fix, if in fact a fix is needed, could substantially affect the performance characteristics of proving the memory argument. It might be better to take a different approach entirely. Daniel Lubarov has suggested that the fact that the memory argument uses a long, narrow trace is an issue for performance.
Some suggestions have been put forward for alternative memory arguments which could replace the current one. Succinct SP1 uses the BEGKN92 memory argument, and Daniel Lubarov has suggested this may be the one to go with. - The SRA (shift right arithmetic) instruction does not have a STARK constraint implemented for its output yet.
- The MULHU32 and MULHS32 instructions on the Mul32 chip do not have STARK constraints implemented for them yet. These are maybe not must-haves, but if we’re going to not implement STARK constraints for them, then at the very least we should move to another chip which is not shared with the UMUL32 instruction.
- The (8-bit) range checker chip does not have a STARK constraint system implemented for it. This needs to be implemented.
- The Div32 chip does not have a STARK constraint system implemented for it. This is maybe not a must have. The Div32 chip either needs to be removed or have a STARK constraint implemented for it.
- It might be desirable to add range check sends to the outputs of the following chips: Mul32, Div32, Shift32, Lt32, Com32, and Bitwise32.
- Currently nothing checks that the program ROM trace is consistent with the CPU main trace. The interaction that checks this is commented out for some reason; it needs to be re-enabled and possibly fixed.
- Every Valida chip is associated with a so-called “preprocessed trace.” In the degenerate case, which is the usual case, this preprocessed trace is empty. Only the program and range chips have a non-empty preprocessed trace. The preprocessed trace is a trace whose contents are known at setup time (i.e., before the input to the prover is known). Therefore the contents of the preprocessed trace do not depend on the statement being proven or the advice data.
The STARK verifier needs to check that the commitments to the polynomials encoding the preprocessed traces are commitments to the correct polynomials. This is supposed to be done by sending a proof of their openings at a pseudo-randomly selected point. This is currently not being done. Plonky3 issue #250 is a blocker to getting this done.
There may be other missing pieces for the implementation of the preprocessed traces. This needs to be looked into further and validated.
Overall, the verdict is that currently the Valida prover / verifier pair is not sound, and possibly it is complete (in the sense that all true statements in its language are provable), but a lot more could and should be done to verify that Valida’s prover / verifier pair is complete (in addition to making it sound).
4. Functional completeness
Functional completeness means roughly that the VM must be able to run the programs which the compiler backend outputs. This means that functional completeness is a moving target, because the compiler is changing. The ISA (Instruction Set Architecture) is in theory the definition of the interface between the compiler and the VM. We are currently exploring options for the ISA and its definition is not pinned down.
Currently we think that the following instructions are going to be required in the ISA to support the compiler MVP:
All of the required instructions which are not supported (marked ❌ in the above table) need to be added to the VM and STARK constraints need to be implemented for them, for an MVP.
We think that the following instructions would be good to have in the ISA, but not strictly required:
5. Acknowledgements
Thanks to Daniel Lubarov and other teammates for answering some questions about the memory argument, preprocessed traces, and the equality test in the CPU chip.