Proposal: Attributable Consensus Solution for DV Clusters

Background

The performance of validators run by a DV cluster depends on the cluster operators doing their jobs successfully and timely. However, these validators may exhibit good performance metrics even if some operators perform poorly or are always offline altogether. While, in principle, this is in the core ethos of DV technology (i.e. allowing well-meaning operators to have faults while keeping the validators unaffected), it is also susceptible to abuse. Indeed, without a special design around it, an operator may, by maliciousness or carelessness, perform its tasks lousily and yet get the same amount of rewards as the rest of the operators.

In this proposal we address this potential pitfall. We want principal providers and any third party to be able to check the performance and availability of the operators verifiably. This is especially important to enable clusters where operators do not know or trust each other. For example, an operator in a cluster where a minority of operators are notoriously offline should be able to attest to this fact, and cut this minority of operators from receiving rewards. Another feature we want to enable is for a minority of online operators to be able to prove they were online, even if the cluster validators could not perform their tasks (due to not enough cluster operators being online, in which case threshold signatures cannot be created). All these allow for a fair reward distribution, where the rewards go to well performing operators.

Here, we describe a mechanism that enables the goals above. Due to the length of our solution, we will split our description into 3 posts. The contents of each post are the following:

  • Post 1 (this one): We describe the problem and its corresponding challenges. We then provide a high level overview of the whole scheme. Finally, we explain how the so-called “Participation Proofs” (see below) are constructed.
  • Post 2: We will explain how participation proofs are used to assign rewards to cluster operators. We will also describe in detail how the correctness of Participation Proofs is ensured via a whistleblower mechanism.
  • Post 3: We will get into low level details of the economics of our solution, as well as the cryptographic details of how certain fraud proofs are verified.

Remark: Here, we use the term “attributability” more broadly than is sometimes used in the literature. Papers like BFT forensics use the term to refer to the ability of detecting nodes that caused a fork in a consensus chain (i.e., detect safety violations). Here, we instead want to identify nodes that were online and nodes that were offline (i.e., detect liveness).

Problem statement

In this post, we aim to design a mechanism that enables the following:

  • Approximately identifying which operators participated in which duties performed by a DV cluster.
  • Identifying operators O that were ready to participate in a task, even if the task was ultimately performed without $O$’s collaboration. This should be doable in the following circumstances:
    • A majority of the cluster operators were offline (and hence, it was impossible to perform the task since the signing threshold could not be met).
    • A majority of the cluster is colluding against O, preventing O from participating.
    • O was ready to participate but was not fast enough to do so. For example, a threshold of very fast cluster operators (to which O does not belong) may produce a signature on a block proposal without O 's partial signature. In this scenario, O does not get a chance to participate in the task. However, it is important to still reward O as it fulfills an important DV role: to save the day when one of the slow operators malfunctions.
  • The participation claims from the previous points should be verifiable on-chain.
  • The solution should cost at most \approx 1 USD per validator per month. That cost should be split among all operators. Dishonest operators may incur a much higher cost.
  • Rewards should be fairly assigned to operators, depending on their participation (and readiness to participate) record.

Challenges

  • Malicious exclusion: Depending on how rewards are distributed, any majority of operators in a DV cluster may be better off ignoring the rest of the operators. This allows the majority to pretend that only the majority participated/was online.
  • Minority: An operator O may be online and ready to participate in DV tasks. However, if a majority of the cluster is offline, O won’t be able to perform any task. Still, we would like O to be able to prove this and get appropriately rewarded for that.
  • Latecomers: An operator in the DV cluster may not participate in performing the cluster duties and even be offline. However, this operator may later go online and pretend that it somehow participated in all tasks. This should not be doable.
  • Non-attributability of threshold BLS signatures: The threshold BLS signature scheme used by Charon DV clusters to perform the validator duties makes it impossible to identify, solely from the aggregated signature, which parties participated or were ready for their corresponding duties.

Overview

In this section we provide a high-level overview of our mechanism. The precise details are provided in the subsequent sections.

Fast track

In our solution, clusters normally perform their DV duties through a fast consensus protocol. As usual, the goal of the cluster is to agree on what messages the cluster validators should sign with their respective validator keys. This is done as quickly as possible, and once a quorum (i.e., at least 2/3-rds) of the operators have reached an agreement, the message is signed and sent. This component of the solution is called the fast track.

C-slots and optimistic Participation Proofs

After a certain amount of time, which we call c-slot (lasting a day on average), the DV cluster runs an additional consensus instance. On this occasion, the cluster seeks to agree on what duties the cluster validators have performed within a certain time window. The consensus produces a piece of data L with this information and a BLS aggregate signature \mathcal{S} of individual operator signatures (this includes a bitfield indicating whose signatures were used). The pair (L,\mathcal{S}) constitutes what we call an optimistic Participation Proof (PP). It is “optimistic” because its reliability depends on optimistic assumptions regarding the behavior of the cluster operators.

Once (L,\mathcal{S}) is produced, it is hashed. The hash is signed by the cluster operators via a threshold signature and sent to a smart contract in a Layer 2 network within a time window \Delta_{update}. The end of a c-slot occurs randomly and unpredictably. This way, operators that want to include their signature in the aggregate \mathcal{S} must remain online and alert. We assume that, at that point, operators will participate in the cluster duties altogether. While this may not always be the case, we believe it achieves an appropriate balance in the tradeoff between reliability and gas costs.

C-epochs and whistleblowers

Every 14 days (a period of time which we call c-epoch), the cluster makes all the optimistic PPs available. Then, whistleblowers can check that the contents of these PPs are correct and raise an alarm if this is not the case. To make the Participation Proofs available to the public, the cluster uses EIP-4844 to create a blob transaction on Ethereum, submitting the PPs as a data blob. An L1 (Ethereum) smart contract stores pertinent hashes and commitments to this data, whose components can be checked against the hashes stored in the L2 in the previous steps. Any whistleblower has ample time to review the proofs and raise complaints if needed. If this occurs, the optimistic PPs are verified on-chain. If the verification fails, whistleblowers lose a bond they had deposited, and otherwise, the cluster is penalized. Most on-chain verifications occur on the L2 SC to avoid high gas costs, and any party can act as a whistleblower.

To avoid an irrational operator submitting bogus blob data on behalf of an entire cluster, we have the cluster create a threshold signature on the data before this is posted to the L1 SC. Whistleblowers can now raise an alarm if the threshold signature is incorrect.

To amortize transaction costs, we also allow clusters to form coalitions. A coalition of clusters submits a single blob transaction containing all the participation proofs of all clusters in the coalition.

This solution addresses the challenges of “latecomers” and “non-attributability of BLS signatures.” To address “malicious exclusion,” we design our reward mechanism so that operators gain more rewards the more signatures in the lists \mathcal{S} there are. This way, rational operators are incentivized not to exclude other operators. At the same time, operators cannot simply add incorrect signatures to increase their rewards since whistleblowers can then raise a complaint.

Pessimistic or VDF Participation Proofs

It remains for us to address the challenge we call “minority.” We would also like to protect operators against “malicious exclusion” when some operators are irrational (and hence are willing to lose some rewards to exclude others).

To this end, we introduce a second type of Participation Proof based on Verifiable Delay Functions (VDF). Any operator may create such proofs and submit them instead of relying on the optimistic PPs. An operator can create The VDF PPs individually without collaborating with anyone else. Hence, they provide resilience against “minority” and “malicious exclusion.” However, VDF PPs are more costly computationally than optimistic PPs, and so we envision a scenario where, as long as things run smoothly, operators rely solely on optimistic PPs, and rarely submit VDF PPs.

A VDF PP consists of proofs of correct computation of a Verifiable Delay Function. As part of its input, the VDF function takes certain randomness derived from an L2 (namely, block headers) to avoid operators computing the VDF PP ahead of time.

A VDF is a function of relatively high computation cost and hard to parallelize. Additionally, VDFs are built together with a mechanism that allows to quickly and cheaply verify that such function was computed correctly.

Points and rewards

As we mentioned, we want to distribute the rewards earned by the cluster validators fairly. Our idea of fairness is roughly the following:

  • Lenience for occasional faults: Operators who have been mostly online and ready to perform duties should receive almost no penalties, even if they stumble occasionally.
  • Heavy penalization for significant faults: Operators who have not participated —or have not been ready to participate— for a significant amount of time should receive a small share of the rewards.

To accomplish these goals, we assign participation points to operators for the duties where they participated or were ready to participate (these are computed according to the information in the optimistic PPs and VDF PPs). At the end of a c-epoch, the participation points are converted to a share of rewards. The conversion leverages a logistic points-to-rewards mapping. This mapping has an “S shape”: intuitively peaking, it assigns roughly an even share of rewards if points exceed 3/4ths of the maximum amount of points achievable. Below this threshold, the mapping quickly decays and becomes negligible when the amount of points collected is small.

Participation points are not computed on-chain. Instead, an operator in the cluster locally computes and submits the points each operator gained. The points are submitted to the L1 SC at the end of a c-epoch. If the operator were to cheat, a whistleblower could raise an alarm.

Diagrams

Costs for honest parties

We next outline the computational and gas costs of our solution. We only describe the costs incurred by honest parties. Malicious actors may end up paying much more since they will be responsible for the costs of on-chain verifications of optimistic and VDF proofs (in case a whistleblower raises an alarm).

At the end of a c-epoch (once every 14 days), an operator in each cluster coalition:

  • Updates an L1 SC with the points each cluster member earned. This means that less than 256 bits are updated.

    • The gas cost is approx. 21k (tx cost) + 5k (32 bytes update) = 26k. At 20gwei per gas and ETH at 1700 USD, this amounts to \approx 0.8 USD.
  • An L1 SC stores the 256-bit versioned hash to the blob data and the 381-bit KZG commitment. This costs less than 1.5 USD as of November 2023.

  • Creates a blob transaction with all PPs created during the c-epoch by the clusters in the coalition.

    The blob gas cost of storing a collection M of optimistic PPs (L,\mathcal{S}) is determined by the blob gas price (unknown ATM) and the size of the data being stored.

    An optimistic PP (L,\mathcal{S}) in a cluster with 1000 validators has a size, in expectation, about 500 bits. In expectation, the cluster creates 14 such proofs during a c-epoch. Hence, the amount of data the cluster submits in the blob is around 7000 bits (0.875 KB). The coalition mechanism allows to amortize the cost of the blob tx among many clusters.

    The cost of blob gas is unknown at the moment, so it is not feasible to obtain a price estimate for this step.

  • The L1 SC stores the identity of the operator submitting all the previously mentioned data and the gas costs of all these transactions (for reimbursement purposes). This amounts to updating at most 32-byte variable.

L2 costs

In the optimistic path, at the end of each c-slot (on average, a c-slot lasts a day), each cluster:

  • Submits the following data to an L2 SC:

    • A hash Hash_{L2}(L,\mathcal{S}) to the commitment to an optimistic PP (L,\mathcal{S}).
    • A threshold signature.

    In total, this is around 600 bits of information.

  • The threshold signature is verified on the L2 SC (this is cheap since computation in the L2 is cheap).

As with the blob transaction gas cost, it is not possible to estimate the cost of these operations reliably at the moment. This is because data storage costs in L2s are bound to change significantly once EIP-4844 starts running.

Terminology, notation, assumptions

We begin by describing some assumptions and terminology our solution uses.

Assumptions

  • We have a reliable L2 with a L1-L2 messaging service that can send data from an L1 Smart Contract to an L2 SC and vice versa. We don’t require this messaging to be fast or super predictable in execution time.
    Candidate L2’s are StarkNet and Arbitrum. Docs for L1↔L2 messaging can be found here and here.

  • Blocks in the L2 are produced at regular, known intervals of time.

    We denote this time interval by \Delta_{L2block}. We allow for a small error in the duration of the block, modeled by a random variable \varepsilon_{L2block}. Thus, it takes \Delta_{L2block} + \varepsilon time to produce an L2 block, where \varepsilon is sampled from \varepsilon_{L2block}.

    Without loss of generality, we assume \varepsilon_{L2block} has expectation 0. Otherwise, we would modify \Delta_{L2block} by adding the expectation of \varepsilon_{L2block} to it.

Terminology and notation

  • C-slot — A collection of consecutive L2 blocks of random length. On average, a c-slot lasts E_{cslot} time. Currently E_{cslot}=1 \text{ day}.

  • C-epoch — A collection of consecutive c-slots. The number of c-slots comprising a c-epoch is variable, but a c-epoch always lasts a number of L2 blocks equivalent to 14-15 days. In expectation, a c-epoch is comprised of 14-15 c-slots.

    Note here we assume that L2 blocks are produced at regular, known, time intervals.

  • L1 Smart Contract (SC) — An SC on the L1.

  • L2 SC — An SC on the L2.

  • n — Used to denote the number of operators in a cluster. Note, n may be different among different clusters.

  • Participation Proof (PP) — Verifiable information on the activity of each operator during a c-slot.

  • Optimistic PP — A PP produced via cluster consensus. It is a pair (L, \mathcal{S}) where:

    • L contains:
      • A c-slot identifier, which includes the L2 block header at which the c-slot began.
      • Information on the tasks performed by the cluster’s validators throughout the c-slot.
    • \mathcal{S} is a pair (S, B) where S is an aggregated BLS signature and B is a bitfield (a list of bits) of length n. Intuitively, B indicates which operators’ BLS signatures have been used to create the aggregate S.
  • Verifiable Delay Function (VDF) PP — A PP produced using VDF computations and proofs.

  • VDF-c-slots (AKA mini-c-slot) — a VDF-c-slot lasts for a random amount of time and for 1 hour in expectation (this is customizable).

    A c-slot is composed of several consecutive VDF-c-slots.

    A VDF PP is associated with a VDF-c-slot. Such a proof, informally, states that “the operator was online during that VDF-c-slot”.

  • \Delta_{update} — The number of L2 blocks that can be finalized between the end of a c-slot and the moment a cluster submits a commitment to its PP on the L2 SC.

  • L1 ↔ L2 messaging service — A reliable mechanism that allows L1 Smart Contracts to call functions of L2 Smart Contracts and vice-versa.

  • Hash_{L2} — A hash function whose computation gas cost is cheap in the L2. E.g., if the L2 is Starknet, Hash_{L2} can be the Rescue hash function.

  • Blob data — A type of data attached to Ethereum transactions that is kept by the consensus nodes for 18 days, and deleted afterward. It will be enabled in EIP-4844.

  • Hash(Com(*)), versioned hash — A 256-bit Keccak hash of a BLS12-381-based KZG commitment of blob data. Cf. EIP-4844.

  • Whistleblower — An agent that locally verifies the correctness of PP (and other data) submitted by cluster operators. If this verification fails, the whistleblower can raise an alarm.

  • Points — Cluster operators earn points for, roughly, the duties they took part in. Points are later converted to rewards.

  • \Delta_{data} — The number of L1 epochs allowed for submitting blob data and points earned to the L1 SC at the end of a c-epoch

  • \Delta_{accuse} — The number of L1 epochs during which whistleblowers can raise complaints.

    A c-epoch lasts for \Delta_{data} +\Delta_{accuse} L1 epochs.

    The idea is that during c-epoch n, the points/rewards for c-epoch n-1 are claimed. To do so, there are 2 phases, one in which blob data and points are submitted, and another where the whistleblowers inspect these and maybe complain. The phases last for \Delta_{data} and \Delta_{accuse} epochs, respectively.

  • B[C] — Blob data submitted by a cluster C in a given c-epoch (which we omit from the notation). This is a list

    B[C]=(B[C,1], \ldots, B[C,M])

    where M is the number of c-slots in the c-epoch, and B[C,i] is a PP for the i-th c-slot of the c-epoch.

  • Duty types — There are five validator duty types, each encoded by an integer: 1 → Attestation; 2 → Block proposal; 3 → Sync committee duty; 4 → Sync committee aggregation duty; 5 → Attestation aggregation duty.

  • \Delta_{L2block} — Time between the finalization of L2 blocks. We assume \Delta_{L2block} is the same for all blocks, up to a small error modeled via a random variable \varepsilon_{L2block}, with expectation 0.

  • Cluster coalition — A set of clusters that submits blob data “together” (to save on transaction costs).

  • Aggregator — An operator that submits the blob data of a whole cluster coalition.

Participation Proof mechanism

Optimistic Participation Proofs

At the end of a c-slot, the cluster of operators uses a consensus protocol to agree on the following data (the details of this consensus protocol will be described in a further post):

  • A piece of data L consisting of:

    • L.attestations — the number of attestations missed by all the cluster validators so far in the current c-slot.

      We store the number of missed attestations rather than the number of attestations performed since the latter can be deduced from the former, and we expect the former to be smaller.

      The number of performed attestations is N_{epoch}\cdot N_v-L.attestions, where N_v is the number of cluster validators, and N is the number of epochs in the c-slot.

    • L.start — The L2 slot index and block header at which the current c-slot started.

    • L.slots — For each duty of type i=2,3,4, a sequence of integers

      L.slots.i = (j_{i1},\ldots, j_{in_i})

      where:

      • For t\geq 1, j_{i, t+1} is the difference between the beacon chain slot index at which the (t+1)-th task of type i was performed and the slot index at which the t-th task of type i was performed.
      • Similarly, j_{i,1} is the difference between the slot index at which the first task of type i was performed and L.start.

      Intuitively, the sequence (j_{i1},\ldots, j_{in_i}) is simply a space-optimized register of the slot indices (s_1, \ldots, s_{n_i}) at which the cluster validators performed a task of type i. Given 1\leq t\leq n_i, we let

      Index(t, L.slots.i)=Index(t, (j_{i1},\ldots, j_{in_i})) = s_{t}

      Note that Index(t, L.slots.i)= L.start + j_{i1} + j_{i2} + \ldots + j_{it}.

  • \mathcal{S}, a pair (S, B) consisting of a BLS signature S on the data L and a bitfield B of length n. We expect S to be the aggregated signature of individual operator BLS signatures on L. The bitfield B indicates whose signatures were used for aggregation.
    Note: Verifying the aggregate signature S, which is expensive both in L1 and L2, will only occur when a whistleblower complains. In that case, we will operate on the EVM, assuming a precompile for BLS pairings. The gas cost, in that case, is between 150k and 200k.

  • The hash Hash_{L2}(L, \mathcal{S}) of the above data.

  • A threshold signature on Hash_{L2}(L, \mathcal{S}), denoted TS(Hash_{L2}(L,\mathcal{S})).

We call (L, \mathcal{S}) an optimistic Participation Proof.

What is expected of an optimistic PP

The following requirements are expected to be met by an optimistic PP. If some are not, a whistleblower can raise the alarm, resulting in penalizations for the cluster participants and rewards for the whistleblower.

  • The signature in \mathcal{S} is valid, and the bitfield contains at least 2n/3 ones (1).

  • L.start contains correct information. I.e., the L2 block number and header at which the c-slot started are correct.

  • The amount of attestation missed, as recorded in L.attestations, is coherent. Precisely,

    L.attestations \leq N_{epoch}\cdot N_v,

    where N_{epoch} is the approximate number of L1 epochs the c-slot lasted for. N_{epoch} is, approximately, N_B\cdot \Delta_{L2block}/6.4 where N_B is the number of L2 blocks in the c-slot, and \Delta_{L2block} is the duration of an L2 block in minutes (6.4 is the duration of a beacon chain epoch in minutes).

  • Let i=2,3,4 and let (j_{i1},\ldots, j_{in_i}) be the i-th sequence in L.slots. Let 1\leq t\leq n_i.

    Then a validator in the cluster performed a task of type i at the slot index Index(t, (j_{i1}, \ldots, j_{in_i})).

Size of an optimistic PP

  • Size of L.attestations. We have

    |L.attestations| \leq \log_2(N_v \cdot E_{att}),

    where N_v is the number of validators the cluster runs; and E_{att} is the expected number of attestations a single validator performs in a c-slot.

    This corresponds to the worst-case scenario where all attestations are missed by all validators. If we let 0\leq \alpha\leq 1 be the ratio of attestations missed by the validators, then, in expectation,

    |L.attestations| =\log_2(\alpha\cdot N_v \cdot E_{att}),

    For example, with N_v=1000; a c-slot lasting on average a day, k_1=225 (the maximum number of attestations performed in a day), and \alpha=1, we have

    |L.attestations|\leq \log_2(1000 \cdot 225) \approx 18 \cdot \text{bits}

    With the same setting and \alpha=0.1,

    |L.attestations|\leq \log_2(0.1\cdot 1000 \cdot 221) \approx 15 \cdot \text{bits}
  • Size of L.slots.2 — Block proposals.

    Currently (October 18, 2023), there are almost 1M validators on the beacon chain. Assuming all of them have a balance of 32ETH, the chances that any of them will be selected for proposing a block in a given slot is 1/1M.

    There are, in expectation, about 7000 slots in a c-slot (a day). Hence, in expectation, a cluster with N_v validators performs N_v \cdot 7000 /1M \approx N_v\cdot 0.007 block proposals in a c-slot.

    In expectation, the slot distance between two proposals is \approx 7000/(N_v\cdot 0.007)=1M/N_v slots. Hence, each entry j_{2t} in L.slots.2 is, in expectation 1M/N_v.

    Hence the list (j_{21},\ldots, j_{2n_2}) has bit size, approximately and in expectation,

    N_v\cdot 0.007 \cdot \log_2(1M/N_v)

    This function increases sublinearly on N_v from N_v=1 to N_v = 1M/e (here e=2.718\ldots), where it peaks and then starts decreasing.

  • Size of L.slots.3 — Sync committee

    For 1M validators, a validator is chosen for sync committee duties every 64 months approximately.

    Hence a cluster with N_v validators will, in expectation, perform around N_v/64 sync duties per month, which is n_5=N_v/(2\cdot 64\cdot 14) duties per c-slot. These will be separated, in expectation, by 7000/n_5 slots (since a day has \approx 7000 slots).

    Hence the list (j_{31},\ldots, j_{3n_3}) has size, in expectation,

    (N_v / (2\cdot 64\cdot 14))\cdot \log_2(7000/ (N_v / (2\cdot 64\cdot 14))) = \frac{N_v}{1792}\log_2(12544000/N_v)

    Hence the blob data corresponding to sync committee duties will have the following size:

  • Size of L.slots.4 — Sync committee aggregation duty

    This is significantly smaller than |L.slots.3|, which is already very small. This is because a validator has to perform a sync aggregation duty 1/8th of the times it is assigned to a sync committee. Hence (j_{41},\ldots, j_{4n_4}) has size, in expectation,

    (N_v / (2\cdot 64\cdot 14\cdot 8))\cdot \log_2(7000/ (N_v / (2\cdot 64\cdot 14\cdot 8))) =\\ \frac{N_v}{14336}\log_2(100352000/N_v)

  • Overall size.

    We conclude that the overall bit-size of (L,\mathcal{S}) is, in expectation and for a cluster with N_v=1000 validators and n operators,

    |(L,\mathcal{S})| = |L| + |\mathcal{S}| \leq \approx 105 + 384+n = 489 +n

    where 384 is the size of the signature in \mathcal{S}. The latter term +n is the maximum length of the bitfield in \mathcal{S}.

    Note that a c-epoch has 14 c-slots (on average). If the cluster works correctly, it will submit one optimistic PP per c-slot as blob data. This amounts to

    \leq \approx 14\cdot(489 + n) \text{ bits}

On attestation aggregation duties

Currently, in each attestation committee (128 validators), 16 validators are selected for aggregation duties. Since a validator is added to an attestation committee once per epoch, this means that, in expectation, a validator is in an aggregation committee every eight epochs. Since there are \approx 3100 epochs in a c-epoch (14 days), a cluster with N_v validators is assigned N_v\cdot 390 duties of Type 5 in a c-epoch.

These are too many to be recorded in the optimistic PP (L,\mathcal{S}): having N_v=10, would lead to a $\approx 2.5$KB sized list L. For N_v=1000, in expectation, the cluster performs more than one aggregation duty per slot.

Instead, we use the following heuristic: we assume that a validator performs an aggregation duty for every 88 attestation duties.

In other words, we assume validators perform exactly the expected number of aggregation duties (restricted to epochs where they performed an attestation duty).

Thus, we assume that the number of aggregation duties performed by the cluster in a c-slot is

(N_v\cdot N_{epochs}- L.attestations)/8

where, as explained previously, N_{epochs} is the number of L1 epochs spanned during the c-slot.

Possible issues with this approach may arise when the actual number of aggregation duties assigned to the validators differs greatly from the expected number. However, we argue next that this occurs very infrequently.

Indeed, the actual number of aggregation duties assigned to the cluster’s validators is modeled by a random variable X following a binomial distribution on N_v\cdot 3100 experiments with success probability 1/8.

The expectation of this random variable is

\mathbb{E}(X)=N_v\cdot 3100/ 8 \approx N_v\cdot 390.

And its standard deviation is

std(X) = \sqrt{N_v\cdot 3100 \cdot \frac{1}{8}\cdot\frac{7}{8}}\approx 18.41\cdot \sqrt{N_v}

This is a relatively small standard deviation, and indeed the ratio of \mathbb{E}(X) + \lambda \cdot std(X) and \mathbb{E}(X) is

1+\lambda\cdot\frac{0.05}{\sqrt{N_v}}

We see, therefore, that about once in 25 c-epochs (1 year), the aggregation tasks assigned to a cluster will deviate by more than two standard deviations from the mean. However, even in that scenario, the relative error with respect to the expectation E(X) is very small.

Additionally, about once every three c-epochs (1.5 months), the aggregation tasks assigned to the cluster deviate by 1 standard deviations. However, the relative error from the expected number E(X) is very small in this case.

End of a c-slot

The end of a c-slot is triggered when one of the following events occurs:

  • The following two hold:
    • The header of the most recent finalized L2 block starts with N_{cslot} 1 ’s.
    • The c-slot has lasted so far for at least two hours (or some customizable amount of time). This is to avoid having super short c-slots, which could cause latency issues due to different consensus processes taking place simultaneously.
  • The end of the current c-epoch has been triggered (see below).

Here N_{cslot} is a parameter that can be used to control how long, in expectation, a c-slot lasts for. It will need to be adjusted so that a c-slot lasts in expectation for E_{cslot} time (currently set at 1 day). The choice of N_{cslot} depends on the L2 we work on, so we leave it unspecified.

:point_right:A key point is that the end of a c-slot is “random and unpredictable.” The motivation for having this is that operators cannot plan ahead of time and, for example, log in right before the c-slot ends so as to add their signatures to the participation proofs (in this event, the rest of the operators would want to include the late comer operator, due to how our reward mechanism works —we will explain this in a future post).

Note: It may be the case that blocks are produced too slowly to be able to properly tune N_{cslot} so that c-slots last our desired expected amount of time. Then, we can use a hash of the block header. Choosing a hash with a larger digest bit-size then allows to be more granular in choosing N_{cslot} and the expected duration of c-slots.

When any of the above two requirements is met, any actor can trigger the L2 SC, calling a specific function.

Any actor can trigger the L2 SC. To incentivize this, we offer a small reward for doing so. Note that we can’t rely on cluster operators to make this trigger since the operators could be interested in extending the duration of the c-slot (for example, if the cluster operators performed poorly during the current c-slot).

When the L2 SC is triggered with an “END C-SLOT” call:

  1. The L2 SC checks that one of the two conditions above is met.

  2. The L2 SC records the corresponding L2 block number, say b.

  3. The L2 SC checks if the end of the c-slot marks the end of the c-epoch. The details of this step are described later on.

  4. Each cluster has \Delta_{update} L2 blocks of time to submit Hash_{L2}(L,\mathcal{S}) and the threshold signature TS(Hash_{L2}(L,\mathcal{S})) to the L2 SC.

    This data is submitted via a race (i.e., the first proof that gets included in the L2 state is the one accepted). We use a race approach to minimize the risk of the time window \Delta_{update} being missed. The “winner of the race” receives a small reward.

    The threshold signature is included to avoid an irrational operator submitting a bogus commitment.

  5. The L2 SC checks whether the threshold signature is valid. This can’t be left for the whistleblowers to check: if it was, an irrational operator could send invalid threshold signatures, griefing the cluster for an entire c-epoch.

  6. The L2 SC reverts if any of the following happens:

    • The hash Hash_{L2}(L,\mathcal{S}) is submitted too late, namely when \text{current block number} > b+\Delta_{update}.
    • The L2 SC already accepted (this includes verifying that the threshold signature on Hash_{L2}(L,\mathcal{S}) is valid) a commitment to an optimistic PP from the cluster for this c-slot.
    • The threshold signature is incorrect.

End of a c-epoch

A c-epoch lasts for a variable number of c-slots. More precisely:

  • When a c-slot ends, the L2 SC checks whether the number N_B of L2 blocks finalized during the current c-epoch corresponds, in time, to 14-15 days. Precisely, the c-epoch ends when one of the following checks passes in the L2 SC:

    14\cdot \text{days}\leq N_B\cdot \Delta_{L2blocks}\leq 15\cdot \text{days}

    Here, we assume that L2 blocks last a “constant” amount of time.

    Alternatively, at any time, regardless of whether a c-slot has ended or not, anyone can end the c-epoch if N_B\cdot \Delta_{L2blocks} > 15\cdot days (and whoever does so earns a small reward). In that case, the c-slot also ends.

    This is to avoid c-epochs lasting much more than 15 days on rare occasions.

  • If this is the case, the c-epoch number in the L2 SC is updated, and a new “point settlement phase” (which we will describe in a future post) is initiated.

  • The L2 SC sends the message “END OF C-EPOCH” to the L1 SC, as well as N_B and the number M of c-slots that comprised the c-epoch. The L1 SC stores this data, it updates its c-epoch number, and begins a new “point settlement phase”.

Note: We want c-epochs to all last roughly the same time because, as we will see next, c-epochs also control the time clusters, and whistleblowers have to settle the points accumulated in the previous c-epoch. This settlement is a complex process with different subphases, which, for safety, should last a fixed amount of time.

3 Likes

Is there any risks associated in situations where an L2 sequencer losing liveness for an extended period of time? It seems that the biggest impact is to submit the HashL2(L,S)Hash_{L2}(L,\mathcal{S})HashL2​(L,S) and the threshold signature TS(HashL2(L,S))TS(Hash_{L2}(L,\mathcal{S}))TS(HashL2​(L,S)) to the L2 SC.

1 Like

Hey @velvetmilkman! Thanks for the question. The risk would be that the Hash of the optimistic participation proof doesn’t make it to the L2 SC within the time window Delta_update. The parameter Delta_update should be set up so that this it is almost impossible that the hash is delayed by such amount of time while the L2 keeps finalizing blocks. At the same time Delta_update cannot be too large as this could allow very slow or offline operators to submit hashes of proofs too late in time. I would say that setting Delta_update = 1,2 hours should work for both goals here.

1 Like