TLSN 🤝 Duolingo Streak

September 22nd, 2024

Hey there!

So I’ve been a Duolingo user for a while now, and my streak has surpassed four years. I’ve been learning some languages, mainly focusing on German.

When you reach milestones like 1000 days, Duolingo gives you an image celebrating the achievement. When I saw this, I realized there’s an opportunity to create an NFT or something similar.

But I don’t work at Duolingo (or at least I wasn’t when I wrote this blog), so I started thinking: how can I prove that I have this streak to someone, without relying on Duolingo’s server*,* so I can generate a proof that on a specific day, I had a specific streak?

I’ve been studying Programmable Cryptography for some months now, and there’s a project I’ve been wanting to use for some time: TLS Notary (TLSN).

TLSN “leverages the widely-used Transport Layer Security (TLS) protocol to securely and privately prove that a transcript of communications with a web server took place.” More information about this project can be found on the PSE webpage.

This is exactly what I wanted to do for this project: generate a proof that I asked Duolingo’s server for my current streak, and they responded with the number.

So, I started learning how to use this tool.

Fortunately, the team behind TLSN provides many tools and resources online to learn how to use it. I found an example that I used as a template and modified it to fit my needs. This was the Discord DM example from TLSN.

All my code is in this repo. It’s still a work in progress, so I will keep you posted on future changes.

I will try my best to explain how the code works, which parts were written by the TLSN team, and which parts I had to modify to make it work for my project.

Step 1: Finding the right endpoint

The first thing I had to do was find where the browser was getting the streak information to display on Duolingo’s frontend. After inspecting the webpage, I found the request I was looking for:

https://www.duolingo.com/2017-06-30/users/{user_id}

As you might expect, user_id is a unique identifier assigned to every user on the platform. If you make a GET request to this endpoint without authentication, you’ll get some general information about the user. However, if you include an authentication token in the request, you will receive more private information, like your email.

The thing I was trying to prove is that you have a certain streak, not just that you know a user_id associated with that streak. So, I made the request to this endpoint, and I used some of this private information (specifically the email) to guarantee that the user generating the proof had the authentication token, which should likely mean they own the account.

Step 2: Modifying the TLSN Discord DM example

Next, I cloned the repository and started making some changes.

First, I ran the TLSN server locally. The full tutorial is in this link.

Then I made some simple modifications to the discord_dm.rs file.

The first change was updating the SERVER_DOMAIN from discord.com to duolingo.com. I then adjusted the data needed for the server to make the request and notarize it. In the Discord example, you need channel_id, authorization, and user_agent. For this case, I kept authorization and user_agent but replaced channel_id with user_id. I then loaded these three environment variables into runtime variables.

This data is used to construct the URL for the request:

https://{SERVER_DOMAIN}/2017-06-30/users/{user_id}

So, when I ran the command RUST_LOG=debug,uid_mux=INFO,yamux=info cargo run --release --example duolingo_streak, expecting everything to work, I encountered this error:

error: kind Config, msg: max received transcript size exceeded: 24659 > 1638

Which basically means that the response was too large.

At the beginning of the file, there are comments explaining that you can increase the limits of sent and received data. I tried changing these limits on the server, but I ended up deciding to try to avoid this problem (I still need to research why those are the limitations of size, and what can be done to increase it. I have in mind writing something about it once I understand more the basics of TLSN). So, I figured they might have some query parameters, and indeed they did. I modified the URL to this:

https://{SERVER_DOMAIN}/2017-06-30/users/{user_id}?fields=streak,email

Now, the only data requested from the server was the streak and the email.

And this worked!

I can now generate proof that I successfully requested Duolingo’s server for the streak and email and received the correct data. I can even verify it without modifying the verify Rust file.

But I wasn’t done yet. I didn’t want to expose the email address when showing the proof, so I needed to hide some of the data.

Step 3: Hiding Sensitive Data

In the Discord example, the authorization token is hidden. I wanted to hide all data referencing the user, so I needed to hide the user_id in the request URL, the email address from the response, and keep hiding the authorization token.

This took some time to figure out because in the example, the request contains hidden data, but the response is fully revealed.

Using the code:

    // Identify the ranges in the transcript that contain secrets
    let (public_ranges, private_ranges) = find_ranges(
        prover.sent_transcript().data(), 
        &[auth_token.as_bytes(), user_id.as_bytes()]
    );

The token and user_id are identified in the request, and the ranges are committed.

The email was trickier because it was in the response, but I replicated the former code, leading to:

    let email = parsed["email"].as_str().unwrap();
    // Identify the ranges in the transcript that contain secrets
    let (public_ranges_recv, private_ranges_recv) = find_ranges(
        prover.recv_transcript().data(), 
        &[email.as_bytes()]
    );

Then, the commitment IDs are collected for both the request and response, the session is notarized, and all but commitments 3, 4, and 7 are revealed, which correspond to the auth_token, user_id, and email respectively.

In this way, the first iteration of the project was complete. I was able to generate a proof of a streak in Duolingo, while hiding user data.

Future work

Next, I want to explore how to use this proof effectively.

The main problem I foresee is how to prevent a user from generating multiple proofs every day and sharing them with others. I think I still need a way to check the user, perhaps using an identifier. However, I want to maintain the user’s privacy, so I’m still considering the best approach.

The other step would be to use PSE’s server to notarize the request. Currently, the verifier must trust the prover, as they are running the notarization server. Doing this on PSE’s server would allow for notarization in a trusted environment. However, I’m still evaluating the best way to address this issue.

Keep in touch for part 2 in the near future.

Subscribe to Sebastian

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

Qf5yu-skHrRi6pJ…0K56eioPz1yzR4o

Author Address

0x0E1F2acAC6530B0…1Ad84032d3dd1a6

Content Digest

17NZ3sM4jJ-8uK6…LOZW0c9CKzkRJdI