How to Build a Crypto Project like an Aerospace Engineer

Within Crypto, I don’t think there’s a single truism more often stated than “we should build smart contracts like airplanes, not iPhone apps!”

In fact, just the other day someone shared this with me:

But no one ever tells you how to actually do it. How does one shift their skills, thought process, and mindset away from building and launching projects rapidly to building them like they’re flying to the moon? Where’s the handbook for learning how to do this? Why can’t anyone tell me?

*Crickets.*

Well, thankfully, I actually happen to have a background in both hardware engineering and building aircraft, and I’ve also been working on building smart contracts for almost as long as my previous career as a flight control engineer. I’m gonna give you all the secrets that no one seems to want to share about how to build projects that always work, never get hacked, and make you millions in the process.

Okay, so I can’t actually do those things, but I have your attention now don’t I?

Alright, what are these pearls of wisdom?

Failure is not an option

It’s just reality.

All software projects have failures. What sets apart good projects from great ones isn’t how perfect the code is, or how much was spent on audits, or even how big the test suite is; it’s way simpler than that:

It’s planning for failure before it happens.

If you can’t reason about and rattle off a dozen or so failures that could happen when launching your brand new application, you’re doing it wrong. The weeks you spent working on gas optimizing your code so that it’s as efficient as humanly possible would be better spent making sure that you not only understand in what practical ways the code could have a failure, but also what mitigations you may have (or make available to your users) so that the impact of those failures is as minimal as possible.

In aerospace engineering, one of the tools that gets used is to conduct a FMECA (or Failure Modes, Effects, and Criticality Analysis) study of the system, identifying all the possible failures that the system can have, for all of the complex components within the system. This analysis is performed during the design phase of the project to ensure that there are protections and mitigations in place for when different types of components (hardware, software, hydraulic, electrical, etc.) have some fault that affects their operation and impacts other components around them.

A example summary of the types of information a FMECA contains
A example summary of the types of information a FMECA contains

Typically, this process leads to designing physical redundancies and failure handling into the system, so that in case an issue occurs, the system has an adequate chance at surviving the incident, as well as minimizing damage to itself, it’s environment, and the occupants inside. This process leads to the creation of a FHA (or Functional Hazard Assessment) which is the high-level overview of the major faults that bubble up from the FMECA study, and what the likelihood and impact of any critical events would be. All of this analysis occurs in the design phase of the aircraft, meaning years before you even get to build your aerospace project, you are aware of every single major fault of the system that could ever happen in practice.

Thankfully, smart contracts aren’t nearly as complex as aircraft are, and there’s a fairly known set of issues that can occur with them, so you can conduct a rigorous analysis like this using available tools (and a little imagination), and can pretty much figure out all the worst case scenarios within a day or two; a week tops. You also don’t have to do this before you get approved to build your project, as that is driven by FAA regulatory requirements meant to enhance the success of new aerospace projects (even though it creates a lot of overhead to the design phase).

Which brings me to my next point…

It has to work, every time

Well, not really.

Or, I guess I should say that the steps you take along the way don’t necessarily have to all work out of the gate. It’s the final product that counts.

A good practice when designing a new complex system is to identify the things that are novel, and then identify things where a pretty common approach or the solution already exists. Once you’ve done that, you can minimize the set of things that you need to experiment with first before you’re ready to combine it all together for the final product (which if you’ve done your experiments well should indeed work pretty well).

In aerospace, there is a tool called TRL (or Technology Readiness Level) which is a very cool framework invented by NASA that attempts to classify the maturity of technologies, classifying their progress from ideation, to simulation, to lab testing, to first flight, to heavy use in production. The idea is that if you can classify the TRL of all the major components of your system, then you can identify which parts are indeed novel (meaning there doesn’t exist a mature option you can use) and which parts you can find alternatives for that work well enough for your intended use cases. By doing this, you can increase the aggregate maturity of your project, and give it the best chance for success, since you’ve minimized the amount of “new things” you are working with that require enhanced verification work before launch.

NASA's TRL scale
NASA's TRL scale

This is a good skill to learn, many engineers at the highest level get pretty good at researching and identifying pieces of technology that meet the needs of the project, and reduces the amount of new code that needs to be created. In Crypto, this is a necessary skill because creating new code has a much higher barrier to cross in order to “qualify” it for production (testing, audits, etc.). Basically, reuse what you can and focus your innovation on what is truly innovative.

But most of all, the only really true test of secure software is simply how much exposure it has had in production, basically a crude calculation of how much time it’s been in that environment, at intended levels of use (the more the better), as well as the range of use cases that it has successfully solved without failure. Software components with a high “TRL” score very highly in all of those dimensions.

Let’s take a practical example, say you’re designing a new bridging protocol. You want to make an innovative cross-chain asset transfer system by having mirror tokens on other networks. To do this, we also need to establish a trading market for those tokens to retain their relative worth. Well, part of that system actually already exists, since we can use Uniswap’s AMM (which is deployed on multiple chains) to ensure that bridged assets retain close parity with their own copies on their originating chain. It makes no sense for us to reinvent the wheel here, this project already works quite well in production, and it’s not the core innovative component we are developing.

Since we’re using an established, mature component for that part of the project, that means we can focus more on the innovative and difficult parts, which makes the overall project much more likely to be successful. And that’s great because…

You only have one shot to make it

The last point in the tweet about having a “[limited] number of shots to achieved PMF” (product market fit, for those who don’t speak VC) is definitely very true. But it’s also predicated on a lie.

Let me explain.

The lie is that you need to achieve PMF almost immediately, since the amount of time it takes to “qualify” your software for production is so high, it really limits the amount of times you can spend rebuilding everything from scratch.

Wait, why are we rebuilding things from scratch?

We covered this in the previous section (kinda), but if we’ve done a good job at reducing the number of truly novel ideas we are trying to explore, we should set ourselves up for success because we don’t actually have to rebuild everything from scratch, just the parts that didn’t work as we expected.

You see, the best way to be successful in building complex systems is breaking them down into smaller, self-contained components that can be independently iterated upon and tested before being combined into the larger application.

This is the part that is probably hardest though, because the ability to truly “test” novel smart contract mechanisms is pretty hard, mostly because the “complex” part of the system happens to be the Cryptoeconomic schemes we are all trying to play with. We don’t truly know how to simulate and test those (outside of simple modeling), so our only choice is to “test in prod” (aka experiment on live, willing degens) once we’ve reached the limit of what local testing and simulation can show us.

So, there’s nothing we can do?

Well, we can take another page out of the aerospace handbook and follow a tiered approach to testing and release to de-risk the likelihood of failure at each step (as well as increasing our chances of hitting PMF, if done correctly). Actually, one of the reasons that we created the cHaOSneT ephemeral testnet product was that we think there’s a gap between local testing and production which needs to be filled by a full system-in-the-loop integration testing platform, and which can also be used to run full-stack simulations of our components, pre-launch “public” testnets for our userbase, and continuous “fire drills” (post-launch) for the team maintaining the project in a controlled environment, with limited impact for project teams and their users. The nearest analogue we have in aerospace is creating Ground Test and Flight Test procedures.

Typically, when aerospace projects get pretty far along in their development processes, they go through Ground Test procedures that are designed as a way to validate that all these larger components actually work. It’s called “Ground Testing” because we can test them in a controlled environment where things are bolted down and secured to the ground. A great example of this is rocket “hot fire” tests, which is basically just a way to see that the (extremely complicated) rocket engines actually do their job.

For Crypto, what we can do to simulate a “Ground Test” of some component is performing larger-scale, agent-based simulations demonstrating that across a variety of different environments that the component functions as intended. A great example of a project doing this in practice is DelV’s (formally ElementFi’s) simulation framework for the next iteration of their protocol:

Anyways, Ground Tests validate that as the project comes together, all the largest pieces are working as intended, which is really convenient to de-risk probably the scariest part of any aerospace project development: Flight Testing.

Flight Testing is scary because no matter how much simulation and testing we’ve done, there’s still no way to know for sure that it’s all going to work together. This is because often the full system simply isn’t capable of being tested in a controlled environment or in a truly “safe” way. The best we can do is try to limit the scope and envelope of our test flights so that we are achieving the key objectives we need to prove that the system is fully capable of the mission we designed it for.

Still, the worst case scenarios often happen, but we prepared for them since we knew from the start what the worst outcomes could be, and we have backups and contingencies for the failures that are possible (since we already did our analysis). This is the best way we can make sure our project pushes the envelope of what’s possible in the safest way we can, so scenarios like a rocket blowing up aren’t catastrophic to the project:

For a Crypto project, what we can do is run a Beta launch or some sort of incentivized testnet, meant to uncover bugs, inefficiencies, and places where we misunderstood the problems we expected the project to face during the design phase. A good team will have designed out a roadmap of these tests all driving towards the final stage of the build process: actually launching the damn thing for everyone to use!

Conclusion

I hope some of my insights from my prior career is helpful to you when thinking about how you can best design and build safe, secure, and successful projects that work well. I want to thank @danielvf and @0x_Osprey for reviewing this article and giving great feedback.


For the latest on all things Ape follow us on: apeworx.io | Discord | Twitter | Bluesky

Subscribe to ApeWorX Ltd.
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.