How a medium severity issue can become critical: our recent Nouns DAO audit

In July 2023 we did two audits for a big Nouns DAO upgrade, first with Spearbit and then with code4rena. During the Spearbit audit, one of the fixes we made, for a medium severity issue, introduced a new critical issue; the critical issue was not caught in the first audit’s mitigation review, but was luckily caught in the second audit.

We messed up but got lucky, and we worry this might happen in other projects. We wanted to share a short retrospective and some mitigation ideas, hoping to help others avoid such mistakes in future.

The issue in detail

If you’re not interested in the technical details of the issues, feel free to skip this section.

A bit of context to help understand the issue:

  • A key change in this project added a minority protection mechanism to Nouns DAO, which lets a minority group fork off to a new instance of Nouns DAO; let’s call those fork DAOs.

  • Forks DAOs come with a classic ragequit mechanism, allowing a token holder to quit by giving up their token(s), and receiving their fair share of the fork DAO’s treasury.

  • One’s fair share of the treasury can include ETH as well as ERC20 tokens. The ERC20 tokens sent upon quit are set in a whitelist configured by the DAO.

Now we can talk about the original medium issue Spearbit found (issue 5.2.16). The gist is:

  • If there is a malicious majority of token holders in the fork DAO.

  • Then they can pass a proposal to change the whitelist of ERC20s, to include faulty addresses, such that the quit function always reverts.

  • Thereby disabling the quit function for all other fork DAO members.

We followed the auditors’ recommendation and submitted a change that allows token holders to specify a subset of ERC20s to receive when they quit. The change was approved during the mitigation review process.

The critical issue we accidentally introduced, and was caught by code4rena (issue H-01), allows fork DAO members who quit to steal tokens by submitting duplicate ERC20 contract addresses to the modified quit function.

How did this happen?

The flaws in our approach

We felt under significant pressure to ship fast; many DAO members were impatient to have this version live. Since our north star is serving the needs of the DAO, we took that pressure to heart, perhaps more than we should have. We were rushing this version forward way more than we would have without that pressure.

This rushing sentiment played a key role in other aspects mentioned below.

Mitigation review

Spearbit’s audit process included a two-week fix period during which the audit team is available for up to two business days, to review fixes we submitted for issues they found. It was during this period that we introduced the critical issue that was missed.

In hindsight, we relied too heavily on the mitigation review. We advise protocol teams to carefully consider every fix, and to not regard time-limited mitigation reviews with the same trust as the primary audit period.

We would like to invite Spearbit to share their learnings with us and the broader community; what could have been done differently?

Signal to noise in issues found

One of our assumptions going into an audit with a reputable org like Spearbit is that we won’t need to review & respond to many minor issues. However, this wasn’t the case.

A lot of our time was spent discussing why we think certain issues are not interesting to us.

Specifically, there were multiple occasions where medium-severity issues assumed that a DAO could pass visibly malicious proposals. Quoting from the report:

While the governance process gives an opportunity to detect and block such malicious proposals, the assumption is that a malicious majority can force through any proposal, even a visibly malicious one. Also, it is not certain that all governance proposals undergo thorough scrutiny of security properties and their impacts. Token holders need to actively monitor all proposals for malicious updates to create, execute, and join a fork before such a proposal takes effect.

While we agree with the general statement, this is an issue with DAOs at large, and the forking feature introduced is actually a way to help mitigate this risk.

These kinds of issues are labeled as medium-severity due to Spearbit’s risk classification matrix:

Low likelihood x High impact = Medium severity.

Because the lowest likelihood level is low, any high-impact issue, no matter how unlikely, is classified at Medium severity.

We recommend adding another level of likelihood below “low” such that: very low likelihood x High impact = Low severity.

Another reason we chose to fix certain issues was so they wouldn’t get flagged during the subsequent code4rena audit. In hindsight, it would’ve been better to leave them as is and mark them as “known issues” for the second audit.

Fixing medium severity issues might not be worth the risk

Making changes to critical areas of the protocol at such a late stage of the development cycle, especially while trying to meet deadlines, might not be worth the risk.

Fixes introduced at this stage will probably not get the same level of testing and attention they require. Knowing that, it may be better to leave some fixes for a future version.

For this reason, we decided to leave some of the issues found in the code4rena audit unfixed for now.

What can we do to improve?

Protocol teams, do your best to allocate ample time for audits, and help your devs feel safe to optimize for quality and less for deadlines. Mistakes during this period can be very costly. And don’t feel like you have to fix every issue; if it’s not a must, err on the side of letting it go for now.

The labeling of issues by severity is useful for understanding which issues should be fixed, but it does not provide the whole picture. When deciding whether an issue should be fixed, the complexity of the potential fix is an important consideration. For example, a medium-severity  issue, with a trivial fix, should probably be fixed. On the other hand, if a non-trivial fix is recommended, more careful assessment is required on whether the fix is worthwhile, and what kind of review or follow-up audit should be done.

We think that auditors could provide more guidance on which issues they recommend fixing. For example:

Issue*: Some issue with proposal creations
Severity: Medium
Potential Fix: Change the rules for how a proposal can be created
Recommendation: Don’t fix in this version due to high complexity. Consider fixing in future protocol versions with additional audits.*

Perhaps auditing teams can check in with their clients during the audit and ask if they are happy with the signal-to-noise ratio. If the clients are experiencing fatigue, it seems very important to pause and discuss how to improve; the mid-audit client experience seems underrated.

Specifically for DAO audits, audits can be much better if there’s early alignment on how to treat all issues that start with a malicious majority assumption; we suspect most protocol teams will prefer to spend very little time on those issues.

Above all, let’s have more public conversations about the auditing experience from both sides and continue iterating on making it awesome.

Thanks for reading,

verbs ⌐◨-◨

Subscribe to verbs
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.