Author: Web3er Liu, CatcherVC
Read in Chinese Version
As concepts such as Web3, Metaverse, and NFT have entered the public eye in the past year, the Crypto industry has officially entered a period of rapid growth. Under the pursuit of various capitals and massive users, Ethereum has become the core of the entire Web3 narrative under its first-mover advantage. After a long evolution, its system architecture has fully realized decentralization and security, becoming a veritable "Stellar Public chain." At the same time, inefficiency severely limits the development of this public chain. Compared with VISA, which processes thousands of transactions per second, with a TPS of less than 20, Ethereum likes an antique of the old times, which is far cry from Vitalik's grand vision of a "world-class decentralized application platform."
In order to meet the massive demand of the Web3 market, different solutions such as Side Chains, new Public Chains, and Rollup have successively entered the stage of history. While star projects such as BSC, Polygon, Solana, Arbitrum, and Optimism are dividing up traffic, their inherent defects are more and more apparent. Since the block generation speed constrains TPS, almost all major Layer2 or new public chains have compressed the number of nodes or unbundled the "consensus" with the block generation process, which directly reduces the block time, but seriously weakens the decentralization and the system security.
Taking Optimism as an example, it uses a single miner node called Sequencer to generate blocks in Layer 2, blocks are generated in seconds. New blocks do not need to be immediately handed over to other nodes for verification but can be finalized locally, which saves a lot of time. Since there is only one block-producing node, the allocation of "bookkeeping rights" is deterministic, so the POW process (the step of randomly assigning bookkeeping rights) can be directly replaced.
By reducing the block generation process, Layer2’s local blocks can go from generation to finalization in 1 second or less. After the user initiates a transaction request, the result can be received in two or three seconds, evenly matched to WeChat Pay.
However, at this time, the new block of Layer 2 has not been audited by the verification node, and there is a possibility of non-compliance. In this regard, Sequencer regularly publishes a local copy of the block on Layer1, including transaction data and StateRoot (associated with the account information on Layer2). The Verifier (verification nodes) on Layer2 will automatically read the content published by the Sequencer and conduct audits to determine whether the Sequencer is suspected of fraud.
Essentially, Optimism uses Ethereum as a "court" for disclosing data and dealing with disputes, and the key point is how often Sequencer publishes data on Layer1. If the Sequencer submits local data for a long time, it will undoubtedly delay the audit progress of the Verifier, and it will take a long time for nodes to reach a consensus, which will seriously weaken the reliability of Layer2.
According to the official browser of Optimism, the time frequency of Sequencer publishing status information on Ethereum can be as slow as 37 minutes, which means that after Sequencer generates a block, Verifier will have to wait 37 minutes before auditing. In contrast, a new block in Ethereum takes only 13 seconds to be audited by nodes on the entire network. The information asymmetry between the Sequencer and Verifier nodes on Optimism is severe, and the reliability of the consensus mechanism is much lower than that of Ethereum.
In this regard, Arbitrum, which is also in the OP Rollup faction, shortens the interval for submitting status information to 2 to 5 minutes once, enabling Verifier nodes to conduct status audits as soon as possible, which significantly reduces the information gap.
However, Arbitrum has the same flaws as Optimism: The Sequencer node responsible for producing blocks is run by the official government, and the "bookkeeping right" is not transferred to the outside world. If the mechanical design is not yet perfect, "procedural justice" cannot be guaranteed. For the sake of insurance, the block producers of Arbitrum and Optimism are endorsed by the official credit to make up for the imperfection of the current system mechanism.
The consequences of this are clear: Arbitrum and Optimism essentially become centralized operators. Although both parties allow users to run Verifier (verification nodes) and challenge Sequencer freely, the official still has the absolute right to speak to the appointment and removal of Sequencer. In this way, even if Verifier points out that the current Sequencer has done evil and forces it to step down, the new Sequencer will still be officially designated.
Essentially, Layer2's block-producing power is concentrated in the hands of Arbitrum and Optimism officials, and its foundation is based on "credit" rather than "procedural justice." At the same time, running Sequencer nodes by officials will bring another big problem: the number of block-producing nodes is small, and the physical location is centralized, which is prone to DDOS attacks or other types of single points of failure.
Taking Arbitrum as an example, its Sequencer node went down twice, which has attracted widespread attention. On September 14, 2021, both Arbitrum and Solana went down due to DDOS attacks, and the block-producing node received too many transaction requests in a brief period, which eventually led to a crash; on January 10, 2022, Arbitrum's Sequencer node, When it went down again, the official said that the node had a hardware failure, and the standby node equipment did not complete the handover in time. Finally, the "single point of failure" caused the shutdown of the entire Arbitrum network.
It is conceivable that the disadvantage of centralized systems such as Arbitrum and Optimism lies in the excessive concentration of resources. Only a tiny number or a single node is responsible for generating blocks, which will make it bear a large amount of access traffic and quickly induce a single point of failure. Block power also makes "fraud-proofs" and "challenge mechanisms" useless, unable to curb the problem of node evil from the root.
Regarding their inherent defects, Arbitrum and Optimism officials have stated that they will gradually improve and implement decentralization in the future. However, the two have not given a reliable solution, and the concrete realization of decentralization is still far away.
In order to comply with the fundamental principles of decentralization, Metis, which is also the OP Rollup scheme, has officially started to reform the system architecture recently, trying to take the lead in realizing the decentralization of Layer 2 in terms of architecture and economy.
Through the above methods, Metis plans to take the lead in realizing the decentralization of Layer 2.
In addition, Metis changed the format of backing up data on Ethereum. On the premise that the peer-to-peer node network can immediately verify the Sequencer's local blocks, and under the premise of preventing it from doing evil in the Layer2 network, Metis backs up the transaction instructions to the off-chain centralized platform, Memolabs, and provides the storage location of the transaction data in Memolabs on Layer1 instead, at the same time, the StateRoot corresponding to each transaction is still published on Layer1.
For the possible "challenge" and "fraud-proofs" scenarios, Metis adds other functions so that when the above scenarios occur, the challenger can restore the original data of each transaction instruction on Layer 1, and complete the "fraud-proofs" without hindrance, Make the existing version and the old version's mechanism equivalent.
By introducing peer nodes and integrating the Memolabs storage layer, Metis shifts the storage task from Ethereum to peer nodes, Ethereum, and Memolabs and introduces new mechanisms to ensure reliability. Since the other two share the storage task, Metis can reduce the data capacity published on Ethereum as appropriate, reducing Gas consumption and significantly reducing the Layer 2 fee.
In the following, the author will interpret essential measures such as Metis' implementation of a peer-to-peer node network and integration of Memolabs storage.
In conventional OP Rollup schemes such as Optimism and Arbitrum, block producers have uniquely identified: only one Sequencer executes transactions and packing blocks. This directly eliminates the randomness of the block-producing nodes. At the beginning of each block-producing cycle, the system no longer has to waste time selecting block producers-in contrast, before each new block is generated in Ethereum, it has to pass The POW or POS process (after merging) randomly selects the block producing node, which seriously delays the time.
However, the randomness of the block-producing node can significantly reduce the probability of a single point of evil due to the frequent rotation of accounting nodes, the possibility of malicious nodes controlling the right of charging to an account is very low. Even if a malicious node obtains the accounting right of a new block, it will still be rejected by other honest nodes if the block’s publication is not compliant. Finally, the honest nodes will re-elect a new block producer, re-publish a compliant block, and the malicious node will be directly overhead.
In this case, as long as 2/3 of the nodes in the network are honest, the malicious nodes can be effectively restrained, which is the famous PBFT mechanism (Practical Byzantine Fault Tolerance). However, the effectiveness of PBFT is based on enough nodes. PBFT will only take effect when the number of nodes is enormous, it is difficult for malicious nodes to attract a large number of nodes, and it is not easy to form collusion. When the number of nodes participating in the block generation is small, PBFT will no longer be applicable, and at this time, the possibility of a single node of evil is exceptionally high.
Existing OP Rollups, including Optimism and Arbitrum, almost all agree that Sequencer will not do evil by default. If the Sequencer behaves maliciously, the Verified node is allowed to "impeach" it, this process is called "challenge." However, the problem is that the data synchronization between the Verifier node and the Sequencer is not immediately performed, and there will be a delay in the middle.
As mentioned earlier in this article, the data synchronization delay of Optimism nodes can exceed 30 minutes, and it will take half an hour after the Sequencer generates a new block for the verification node to audit, which will cause potential security risks. Although Arbitrum reduces the delay to a few minutes, it does not open the authority to run Sequencer to institutions other than the official government, which is not conducive to economic decentralization. In addition, it is based on the "credit" of the project party, which seriously violates the “procedural justice" principle of blockchain.
In addition, since Optimism and Arbitrum do not issue tokens, they cannot incentivize validator node operators with high intensity, which is not conducive to expanding the number of nodes, making Layer2 more like a consortium chain rather than a public chain.
In order to avoid the above problems, Metis has made many improvements to the original architecture of Optimism, the most important of which is to open the Peer Node.
In essence, the Sequencer Pool, which was initially a subnet under the Metis network, has become a "committee." This committee is composed of peer nodes. Its function is to act as or supervise a Sequencer, similar in form to a POS public chain.
According to the scheme being implemented by Metis, the Sequencer Pool has been put into operation with a scale of more than a dozen peer nodes. Under such a network scale, the time complexity of communication between peer nodes is less, and consensus on new blocks can be reached immediately. At the same time, different peer nodes can act as network loads to meet external access requests, and users do not need to accept data provided by a single node unilaterally.
Metis now gets two security layers from the peer-to-peer nodes network and the Verifier nodes. Among them, the peer-to-peer nodes can verify the local data of the Sequencer on Layer2 in real-time, and the Verifier is mainly responsible for verifying the data submitted by the Sequencer to Layer1.
In the future, Metis plans to expand the number of peer nodes in the Sequencer Pool on a large scale to make it more secure, and incorporate the Verifier verifier node into the Sequencer Pool list, so that all peer nodes can act as Sequencer, as well as serving as Verifier. At the same time, Metis plans to introduce a new algorithm and timestamp generation mechanism while maintaining high efficiency to achieve "change the Sequencer every few blocks" to ensure decentralization.
In most public chains or Layer 2, the database that records user information adopts a tree-like structure, called a state tree, and the hash value of the tree root is called the StateRoot. After a transaction instruction is executed, the status of some accounts will inevitably change, and the hash value of the root of the status tree will also change accordingly. It can be said that the execution of each transaction will generate a new StateRoot. From the perspective of time, the two are in a one-to-one correspondence.
If you list each [transaction instruction content] and the corresponding [StateRoot] in chronological order, you can get an accurate ledger. In traditional OP Rollup schemes such as Optimism, this is what Sequencer stores on Ethereum.
The Verifiers read these and check their accuracy. Generally speaking, the Verifier node will execute the transaction instructions in chronological order and obtain a batch of StateRoot through its calculation. After that, the Verifier only needs to compare the StateRoot calculated by itself with the StateRoot submitted by the Sequencer. For example, when a teacher does not know the standard answer in advance, he temporarily uses mental arithmetic to correct students' math homework.
If Verifier finds a problem with a transaction instruction or the corresponding StateRoot submitted by Sequencer, it will initiate a "challenge" and provide a "fraud-proof."
In Optimism and older versions of Metis, Sequencer will publish transaction instructions and corresponding StateRoots to Ethereum, essentially using Ethereum as a storage layer and using the Ethereum network to handle the "challenge" process. Although this can ensure data availability, the Gas consumption is very high.
Take a batch of transactions released by Optimism in Ethereum as an example. The batch contains a total of 204 transaction instructions, and the gas fee consumed exceeds US$211, which is equivalent to the storage fee of a single transaction instruction exceeding US$1; additionally, considering the Gas required to store the corresponding StateRoot for this batch of transactions , Optimism's storage fee for a single transaction can reach $1.50, which is still too high for most users.
In response to this problem, Metis has made essential adjustments recently. Metis does away with the step of directly storing transaction instructions on Ethereum and dumps transaction batches to Memolabs, a platform similar to Filecoin but with lower storage costs and faster data retrieval speed. By integrating the Memolabs storage layer, Sequencer first stores many transaction instructions in Memolabs and then publishes the storage index corresponding to this transaction batch on Ethereum. The Verifier node can read the original transaction data from Memolabs through the index value.
At the same time, since the StateRoot is more critical than transaction data, they are still stored in Ethereum.
To sum up, the philosophy of Metis is: that there is no need to deposit the content of Ethereum, and it can be exchanged for the equivalent in other ways. This can save storage costs and reduce the cost pressure on users. This aligns with Occam's razor: "Do not multiply entities if you do not have to."
Through this storage structure, Metis can significantly reduce storage fees, reducing the transaction fee of a single Layer 2 transaction to a few cents. Metis has become the lowest gas fee in mainstream Layer 2.
However, Metis's approach raises other questions: Does changing the storage structure change security or data availability? In this regard, we will analyze a variety of possible outcomes.
The security and data availability issues of Metis and OP rollup have two aspects. The FIRST ONE is:
When the Sequencer executes the transaction in Layer 2, it will immediately finalize it locally, temporarily possessing "finality." The specific scenario is that after a user initiates a transaction request on the Metis network, the result will be received within seconds. The question here is, is the temporary "finality" given unilaterally by Sequencer reliable?
Since Metis's Sequencer will immediately synchronize the information to the peer nodes of the Sequencer Pool after the block is generated, the nodes can immediately audit the block content, and if it is found that the Sequencer has submitted an illegal block, it can be removed from the Sequencer Pool. Therefore, the security here is equivalent to that of ordinary public chains. At the same time, the outside world can choose information sources among multiple peer nodes without unilaterally trusting a node, and there is no problem with data availability.
The SECOND QUESTION is:
Will the verification process and challenge mechanism be affected after Metis transfers the transaction data to Memolabs? Will the nodes that newly join the Metis network encounter inconvenience when synchronizing historical data?
There are many possible situations involved here that can be classified and discussed. Since Metis still publishes the StateRoot to Ethereum, the availability of the StateRoot will not be affected. The availability of transaction data is targeted at Verifier nodes or nodes newly added to the Metis network.
For the latter, new nodes only need to synchronize historical data through other Verifiers or peer nodes and can also read transaction data on Memolabs and StateRoot records on Ethereum. At present, Metis has more than 80 privately running Verifier nodes, which already have vital data availability. Considering that the number of Verifiers is still expanding, new nodes should not face many problems when synchronizing historical data.
The problem is for the existing Verifier nodes: whether the transaction data can be successfully obtained and the corresponding StateRoot can be checked. If it is found that the content submitted by Sequencer is incorrect, can the "challenge" be successfully carried out on Ethereum?
For this problem, the following scenarios can be analyzed separately:
- If, after auditing, each transaction can be matched with the corresponding StateRoot, Verifier completes the data synchronization, and there is no need to initiate a "challenge." There is no problem at this time.
- If Verifier finds that a particular transaction instruction and StateRoot cannot match, the StateRoot must be wrong. The Verifier can ask Sequencer to disclose the transaction data corresponding to the Error Status Root to Layer1.
- If the Sequencer agrees, the "challenge" process goes smoothly, and the Sequencer is penalized;
- If Sequencer does not agree, Verifier can write the transaction data read in Memolabs into Ethereum to complete the "challenge," and Sequencer will also be punished;
Obviously, in the above scenario, data availability and "challenge" mechanism are not affected.
2. If the Sequencer stores forged transaction instructions in Memolabs (the digital signature is invalid), Verifier will initiate a "challenge"; in addition, Verifier must obtain the correct Layer2 native transaction instructions to verify the correctness of the StateRoot.
At this time, Verifier can ask Sequencer to publish related transaction batches on Ethereum, which will cause Sequencer to spend a lot of gas fees, which is equivalent to a disguised penalty; If Sequencer refuses, Verifier can disclose the wrong data read from Memolabs to Layer 1 and start a "challenge," Sequencer will be punished more severely.
Under normal circumstances, after the Verifier successfully challenges the Sequencer, the loss it suffers will be much higher than the gas fee consumed when the transaction batch is published on Layer 1. Therefore, if Verifier requires Sequencer to publish transaction data on Layer 1, it must disclose the correct transaction data.
At this point, the Sequencer must release a single transaction batch required by the Verifier, which contains hundreds or thousands of transaction data, and the gas consumed when Layer 1 is released will be very high, even hundreds of dollars, which is equivalent to a disguised penalty.
As seen from the above discussion, data availability and the "challenge" process are not affected.
3. If Sequencer publishes a fake Memolabs storage index on Layer 1, Verifier cannot successfully read the data contained in the transaction batch. At this time, it can request Sequencer to disclose the transaction batch on Layer 1 as described above. If it refuses, the Verifier can obtain the corresponding data from the peer node, continue the subsequent verification work, or initiate a challenge.
Through the above well-designed mechanism, Metis can protect the rights and interests of Verifier nodes. However, in order to prevent Verifier from abusing its power and maliciously requiring Sequencer to write transaction data in Layer 1 and to attack honest Sequencer runners through gas consumption, Metis makes the following requirements:
- If Verifier requires Sequencer to write transaction data on Layer 1, it needs to pledge a certain amount of funds in advance to obtain the whitelist qualification, and every time a similar command is issued to Sequencer, a handling fee will be consumed; the value of this handling fee has been carefully calculated It can prevent Verifier from frequently sending unreasonable requests to Sequencer.
- Any node can initiate "challenges" and "fraud proofs." In theory, these nodes can cooperate to ensure data availability and security.
According to the core arguments put forward above, combined with the recent official developments of Metis, it is concluded here:
The fundamental problem of OP Rollups such as Optimism and Arbitrum is the centralization of Sequencer nodes, which requires a reliable solution; Metis tries to be the first to realize the decentralization of Sequencer.