Cleaning 70k entries

My chest was tight when we pushed the raffle page live. The posse has taken down our website before and this was probably the biggest thing we’d launched since the Test Subject DB. We’d learnt a lot since then; we now have a whole load testing process where we can model the volume we have come to expect and build our tech accordingly.

On the flip side I also didn’t know how quick uptake would be! As we all know, the market is weird right now. As I hit the deploy button I refreshed our database every 5 minutes, seeing the entries come in and texting Joe a running count. I did not expect we’d hit almost 70k unique entries in 48 hours. Although, we couldn’t celebrate just yet, it was time to clean the list for bots and malicious intent.

It’s pretty expected that a project of our “hype” would have lots of people trying to bot it, multi-wallet or abuse it in general. We actually received some screenshots of developers writing and testing their code! As a dev myself, it was interesting to see that the amount of bot entries grew exponentially as time went on; developers had clearly been working on programmatically interacting with the page or structuring their API requests through trial and error, finally cracking it towards the end of the raffle window. We knew this was going to happen, and we were ready.

In the end we ended with around 3x the wallets that we were aiming for, all ready to mint. I wanted to share some of the insights that we had doing this, and some of the techniques, so that we can spread our knowledge to other collections launching in the future:

P1: Data Collection

I had a rough idea of bots as the raffle went on as we were using Google reCAPTCHA which gave every entry a score from 0 to 1 (0 being likely a bot, 1 being not a bot). They base this on a bunch of factors that I won’t go into, but in short, it’s clever. We also collected a hashed version of the ip (you hash it to prevent storing personal information on the server but still being able to check for uniqueness) and from the Discord and Twitter authentication we could check ages of accounts. We enforced uniqueness on Discord, Twitter and ETH address. Everything else were used for context, rather than hard cut-offs. In general we didn’t want to exclude genuine people who, for example, may have only figured out what Discord and NFTs were in the last couple weeks.

P2: Scoring

We worked with gMan who has de-botted raffles recently for projects like Boki, Lacoste and Sneaker Heads (as well as working with RCC and others with his other product: NFT Sentry). I won’t go into all the specifics as that is his secret sauce but I’ll tell you about some interesting decisions we made. Mostly because I found it fascinating!

Wallet Balances/ Multi-walleting

One of the key ways to bot a raffle is to move ETH around. You enter with one wallet, transfer the 0.35 ETH to another and then enter again. This bypasses any validation you have. Luckily this is fairly easy to detect.

Firstly you can look at wallet funders, and detect which wallet has funded the entry and you can jump back as many transactions as you want to start to build a complex picture of wallet relations. Our longest chain of eth transfers was 124 wallets long! And a single wallet funded over 90 raffle submissions.

A snapshot of one of our wallet chains
A snapshot of one of our wallet chains

There are of course thresholds to this, as friends or partners pay each other in ETH all the time. That’s why very few of these things are hard cut-offs but instead context that we can combine with other factors that help us to make informed decisions.

Secondly, we also looked at balances, and whether you actually had ETH in your wallet when you entered (technically it was possible to spoof it on the client to get round validation at the time). But we didn’t just look at ETH, but also stables. Especially in the bear market right now, a lot of people transferred into ETH to enter the raffle and then back into stables. In hindsight I should have checked stables on entry and not just ETH, and would encourage other projects to do this for future raffles in the current market!

Both of these gave us a picture of genuine funding.

IPs

I mentioned earlier we collected IP hashes, generally this allows you to spot when one person is sending multiple entries from the same location. We didn’t force uniqueness on this, I’ve seen enough stories of partners in the same household wanting to mint and be refused; however it’s unlikely there is a 4 or 5 person household all entering from the same IP. More context.

reCAPTCHA

ReCAPTCHA can be blocked by ad-blockers and other privacy software, so if we didn’t have a score we didn’t penalise you.

Polygraph

We put these in there as basic questions that were easy to answer if you had some interest in the project. Ultimately we wanted winners of the raffle to be biased towards community members that had missed out on Test Subject status. We only cut-off entries who got them all wrong, otherwise it just affected the overall scoring.

Summarising

What I’ve found interesting doing this is that it’s shocking that any project can do a fully open public mint. Perhaps at a certain scale bot developers aren’t interested, but even if there is a small risk, it doesn’t seem worth it, even if you’re doing a raffle where “everybody” wins.

All of the above, helped us build context on every entry and form a holistic picture of who were likely bots and those that were real people. Because of this, I have a high degree of confidence that those minting in our public allocation will be genuine holders. If you are a lucky winner, it’s worth reminding you that the public mint will be oversubscribed. I know, I know, and I’m sorry. But no-one wants a slow mint or multiple complex phases and even with the oversubscription the gas will be fine and much better than if there was no raffle at all (as seen above) - more on this in a future post.

More details on mint day (30th) timings/ phases coming soon.


P.S. For Holders

Everyone who entered our raffle and we deemed not malicious had a chance to win. But it’s worth saying that all of the above wallet intel could also be used to start to look at your behaviour as a holder and filter on that. I say that only because I reckon most people don’t know that? I certainly didn’t think about it before this; so it’s worth being aware of what you think your wallet says about you as you enter future raffles.

P.S. For Projects

We now have a fairly good dataset on wallet addresses, discord and twitter accounts that are bots or malicious. I’m more than happy to share this with any project creator who reaches out.

Subscribe to tmw
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.