Structuring a Fair Distribution for the Legendary Smurf Reveal

Like many blue chip NFT collections before, the Smurfs’ Society Legendary Collection consists of profile pictures (PFPs) with varying traits and degrees of rarity. The value of these PFPs lies in the combination of traits they have. The higher the rarity tier of an NFT, the greater its value compared to those with more common traits.

With many of these collections, buyers mint or purchase their PFPs blind without knowing which traits they will have, until they are “revealed” at a later date. Because the rarity significantly influences the price on secondary markets, it is critical to ensure a fair and transparent distribution (i.e. to make sure that rare NFTs can not be purposely distributed to certain buyers). This article gives a technical overview of how we designed our reveal to do just that.

Introduction

Before going over the reveal strategy itself, let’s look at the different reveal flow options we discussed (and implemented) and their possible impacts:

  • Option 1: generate the ID of the token during the reveal process and associate the same with the internally assigned ID. 

  • Option 2: keep the ID identical to reduce the costs of burning and minting a new PFP and save the reveal status on-chain.

  • Option 3: Same as option 2, but save the reveal status off-chain. 

We can summarize the pros and cons of the three solutions in the following table:

To avoid incurring transaction fees, we chose the fully off-chain solution. Now, because off-chain manipulation is very easy to achieve, we needed to define an innovative way to demonstrate fairness before the reveal started. To achieve this, we decided to go with the provenance hash solution used by Bored Ape Yacht Club with some small variations. 

In the following sections, we will cover the key topics required to complete fairness and transparency by:

  • defining the proof of no manipulation (provenance hash)

  • defining the shuffling process used

  • mapping IDs from the shuffling outcome and the pre-existing IDs

What’s a provenance hash?

Before diving into the details, let’s go over what a provenance hash is. Let’s consider a collection with 3 images, each having the following hashes (a numeric or alphanumeric string assigned to a piece of data by applying a function whose output values are all the same number of bits in length. A hash is simply a one-way conversion of data to a string that is extremely difficult to reverse and that can be considered as a unique fingerprint):

Image 1: 1CD435E211E255D8234CCA4F751AF79BA125E99E161F66AB9BCD977200712D5D

Image 2: 85AD2E7407A203E8E2B86C036DFEF379F6AADBADFAE4E424AEF44E20E2B9864E

Image 3: 2950A947656CE5B11101A9C487452514DFB7118CAB29EAB4CBB8EAF774D585DD

The provenance hash will be the hash of the following chain of characters:

1CD435E211E255D8234CCA4F751AF79BA125E99E161F66AB9BCD977200712D5D85AD2E7407A203E8E2B86C036DFEF379F6AADBADFAE4E424AEF44E20E2B9864E2950A947656CE5B11101A9C487452514DFB7118CAB29EAB4CBB8EAF774D585DD

Which would result in the following hash:

echo "1CD435E211E255D8234CCA4F751AF79BA125E99E161F66AB9BCD977200712D5D85AD2E7407A203E8E2B86C036DFEF379F6AADBADFAE4E424AEF44E20E2B9864E2950A947656CE5B11101A9C487452514DFB7118CAB29EAB4CBB8EAF774D585DD" | sha3sum -a 256

64f3e7ab57007a003ed8722a2bf3f2df3ac2975ec59ed957168060aa006c856d

Because the provenance hash is a hash of hashes concatenated in a very specific order, any change in the order of the images would result in a different hash:

echo "2950A947656CE5B11101A9C487452514DFB7118CAB29EAB4CBB8EAF774D585DD85AD2E7407A203E8E2B86C036DFEF379F6AADBADFAE4E424AEF44E20E2B9864E1CD435E211E255D8234CCA4F751AF79BA125E99E161F66AB9BCD977200712D5D" | sha3sum -a 256

24d748002b8a85e0cba8a82b5490a676d35970907863dcbd771aefdc7b8e4b34

Because any change of order would generate a different provenance hash, making this hash public gives full confidence that no manipulation can be made on the images post-publishing.

Shuffling algorithm

On its own, the provenance hash demonstrates that no manipulation is possible but it does not mean it’s fair. We, the team, could allocate the rarest PFPs for ourselves and still get a valid provenance hash. To make it fair, we needed to add complete randomization in the distribution process to properly assign an image to a given ID or sequence number later used in the provenance hash calculation. And this process must be externally verifiable: it’s not enough to say we did it, we need to show how we did it and what data we used for it. 

So what’s a deterministic way to shuffle data? Here is a small snippet of code in Scala (one of our preferred programming languages) demonstrating how it works:

val random = new util.Random(41L)
random.shuffle(1 to 50)

Which would always give us the same result - an unsorted vector of IDs:

(24, 23, 38, 16, 36, 9, 33, 21, 37, 31, 6, 10, 17, 14, 44, 7, 48, 43, 4, 50, 47, 34, 20, 46, 11, 3, 28, 19, 2, 22, 45, 49, 13, 12, 39, 29, 35, 41, 40, 26, 32, 5, 42, 27, 15, 30, 18, 1, 25, 8)

Now, to make this shuffling process fair and transparent, we needed to ensure that:

  • the data was externally verifiable - we decided to use blockhash

  • it was totally unpredictable - we could not know the externally verifiable data before hand to avoid checking if one was favorable to us or not

  • extraction for randomness could only be done once - same as before: if I can redo the extraction, I can do it till I’m happy with the outcome

We achieved the previous points with the following strategy, using two different block hashes as seeds:

  • The first one was used when we saved the root provenance hash (Provenance has been calculated from all images in the order as coming out from the artwork PROD team). The transaction can be found here. The resulting seed was:

    0xae8b799c39509cefaa6ef0719d888aeabf8ced9c53f01423d4e2a201c05efb40
    
  • The second one was defined in a second transaction but dependent on the first one. As we executed the transaction the future block to get its hash from was defined in the same transaction. The resulting seed was: 

    0x1872da5c60b848377a2777cb3b8b4ff1962adb7536881ad60a4d46c32d750008
    

For the last condition, you can refer to the on-chain code here for the following functions:

- setUnshuffledProvenanceHash
- recordSeedFromDefinedBlock
- setProvenanceHash

Token mappings

So now that we had a way to properly shuffle the data and prove it was not changed before and after, we just needed to properly map the token IDs with our sequential sale and the different mint flows (crystals, bucket) we completed. The very particular aspect in our case, is that crystal Smurfs are considered semi-revealed since they are tied to a very specific Smurf character.

In the first batch of possible Smurfs (50 characters have not been designed and are not available yet), we have 10k PFP tokens produced by the Production artwork team that needed to be mapped:

  • 5k PFPs are known with a predefined character, meaning a holder of a Papa Smurf PFP (originated from a crystal flow) is guaranteed to extract a Papa Smurf image post reveal.

  • 5k PFPs are 100% random, meaning there are no constraints.

To do so, we took a 3-step approach:

  • Phase 1: calculate the reference hash for the image and save all images into a cloud object store (moved to IPFS later). All PFPs were given a sequence number from 1 to 10,000 with 50 contiguous IDs for each smurf (ex: Lucky Smurf IDs are from 1 to 50, Hunter Smurf IDs are from 51 to 100, etc.). The provenance hash calculated here will be saved on-chain.

  • Phase 2: map the files in an order that respects the semi-revealed constraint for the crystals. We also calculated an associated provenance hash but it is not relevant per se since this is a transition state. 

  • Phase 3: final shuffling from the blockhash generated on-chain.

Each one of these phases required us to shuffle in a different way, which we’ll describe in the following sections. 

Phase 1

To illustrate this phase, it’s best to take an example. Let’s use Hacker Smurf, which has the system ID # 148 (set during the gamification phase). The token range outcome for this Smurf is: 7351 -> 7400. 

The outcome of this phase was the generation of the root provenance hash that is used as a reference based on the original order of the 10k PFPs. This provenance hash is saved on-chain under the reference __unshuffledProvenanceHash. 

As the transaction gets added to a block, the blockhash of the block is used as a seed for the second phase to shuffle all PFPs. In addition, a future reference of a block is saved to define which blockhash is used for phase 3.

Phase 2

This phase is the most critical as we needed to properly distribute the PFPs with a valid ID guaranteeing the semi-revealed constraint. To do so, we followed the steps defined below:

  • shuffled numbers 1 to 50 as shown before

  • for each Smurf, we took the first 25 (from the shuffled sequence) and saved them with their new IDs. Based on the shuffling example we provided, PFP # 24 was moved to position # 1 for each Smurf. Going back to the Hacker Smurf example, his new range was now 3676 to 3700 and PFP 3676 became associated with image # 24 of this Smurf. 

  • then took all the remaining Smurfs with the old IDs and shuffled them again with the same seed. This was to ensure that the Smurf characters wouldn’t be contiguous in the unrevealed range. 

The outcome of this phase was a new provenance hash, but more importantly, a mapping that fully aligned with what we have on-chain.

Phase 3

This phase was very similar to the previous one with the exception that we only shuffled in the assigned groups in the previous phase:

  • shuffled numbers 1 to 25 as shown before

  • for each Smurf, we re-assigned tokens based on the new position

  • for the non-crystal PFPs, we shuffled them again

The outcome of this phase was the final provenance hash that was saved on-chain under the reference __provenanceHash.

As we finished the job, we realized that Phase 3 was extra and that we probably could have completed the same outcome with just phase 1 & 2. The main reason for phase 3 was more to guarantee the non-contiguous token ranges in the range 5001 to 10,000 and ensure a good distribution.

Summary

As we generate all this data, we will make it available for consumption and some snippets of code will be shared to describe the key steps of the process. The results look like a table containing the following information you can find here:

Our goal here is to provide total transparency and give our community confidence that our distribution has been fair. We are just as excited as you are to see which Smurf we’ll get with the reveal! But in true Web3 fashion, you don’t need to take our word for it. You can see for yourself.

Subscribe to The Smurfs' Society
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.