Post-Mortem: Details & Mitigation of the GovShuttle Module Bug

On September 18th, a bug in the GovShuttle module disrupted Canto Lending Market governance by not allowing proposals to be written to the governance contract on the Canto EVM. Specifically, upon the submission of governance proposal 16, a GovShuttle proposal intended to update comp speeds in the CLM caused some nodes to fail. The bug predominantly affected smaller validators and as a result did not pose a threat to the security of the Canto network. 

In the weeks since, the Plex team has conducted an extensive audit of the Canto codebase, and specifically our custom GovShuttle module. We are pleased to confirm we have identified the cause of the bug – non-determinism in the GovShuttle module – and mitigated it with the release of the v4.0.0 Canto validator binary.

Below, we provide a post-mortem of the bug in the interests of furthering the Cosmos SDK and Ethermint ecosystems and providing transparency to both Canto users and validators.

Timeline

  • September 18 – Block 804214

    • On submission of proposal 16: a GovShuttle proposal intended to update comp speeds in CLM caused some nodes to fail.

    • Nodes that failed used a snapshot from block height > 804214 originating from a node that did not fail in order to run their nodes again.

  • October 18 – Block 1244562

    • On execution of proposal 23: a GovShuttle proposal that passed in the voting stage did not append a lending market proposal to the mapContract .

    • All nodes were intact

    • Proposal appeared successful with no error messages

Description of Issue:

The [AppendLendingMarket](https://github.com/Canto-Network/Canto/blob/v3.0.0/x/govshuttle/keeper/proposals.go#:\~:text=AppendLendingMarketProposal) function of the GovShuttle module is a function that is called on submission and execution of lending market proposals and is intended to append the proposal metadata to the mapContract on the Canto EVM. In Canto v3.0.0, the address of the mapContract is stored in the keeper using the following lines of code:

if nonce == 0 {
	*k.mapContractAddr, err = k.DeployMapContract(ctx,lm)
	if err != nil {
		return nil, err
	    }
		return lm, nil
	}

This code checks if the nonce of the GovShuttle is 0, and if so, sets the mapContractAddr to the deployed mapContract’s address. The GovShuttle nonce is only 0 if no lending market proposals have been submitted. On execution of the first lending market proposal, the GovShuttle will deploy the mapContract to the EVM and the nonce will increment by 1.

The following lines of code show the CallEVM function call that is used to append the lending market proposal metadata to the mapContract:

_, err = k.erc20Keeper.CallEVM(ctx, contracts.ProposalStoreContract.ABI, types.ModuleAddress, *k.mapContractAddr, true, "AddProposal", sdk.NewIntFromUint64(m.GetPropId()).BigInt(), lm.GetTitle(), lm.GetDescription(), ToAddress(m.GetAccount()), ToBigInt(m.GetValues()), m.GetSignatures(), ToBytes(m.GetCalldatas()))

Since this EVM call uses the keeper storage to determine the mapContractAddr, non-determinism between nodes was introduced. Nodes that persisted the keeper storage on their memory would have the correct mapContractAddr, whereas nodes who lost the keeper storage would use nil as the address value.

This means that the first lending market proposal would always be successful, while subsequent proposals would have an increasing likelihood of failure as time went on. Eventually, it is likely that all nodes would lose the GovShuttle keeper storage on memory and all nodes would use nil as the address value.

This explains the behavior of the bugs that occurred on mainnet. The first 2 lending market proposals, 7 and 8, were successful and caused no issues for any nodes. Proposal 7 set the address in the keeper storage using the code from snippet 1, and because proposal 8 followed shortly after, the address persisted in each node’s memory. All nodes had the same correct map contract address during this time.

However, proposals 16 and 18 caused some nodes to appHash and fail. These proposals were submitted 30 days after proposals 7 and 8. During this 30 day period, it is likely that some nodes lost their keeper storage in memory due to reboots. As a result, the nodes that persisted the correct mapContractAddr stayed up and nodes with nil value as the address failed.

The most recent lending market proposal (proposal 23) did not cause any nodes to appHash and was submitted successfully. On execution it did not append the proposal metadata to the mapContract as intended. As this was 60 days after the address was first written to the keeper storage, it is likely that all nodes were using the nil value at this point. It is also possible that the recent chain upgrade from v2.0.0 to v3.0.0 and the planned upgrade halt cleared the keeper storage for all nodes. Since all nodes were using the nil value as the address to write the proposal metadata to, no nodes failed and the proposal was not successfully written.

Resolution

The fix for this bug was quite simple. The only code change required was to persist the mapContractAddr to the KV store. This ensures that the address stays consistent and prevents any non-determinism amongst nodes. 

For more information on the custom GovShuttle module, you can review Canto governance documentation as well as our initial overview.


About Plex

Plex is a group of chain-native builders with backgrounds in HFT, mechanism design, and software development. We are currently exploring the intersection of decentralized finance and social coordination. If interested in collaboration, please reach out at twitter.com/Plex_Official.

Subscribe to Plex
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.