Lens Protocol Primer and Data Analysis

Decentralized Social Media Primer

Decentralized social media is very relatable and an excellent example to express how blockchains can help drive positive social change.

Let’s contextualize why it matters by looking at the current problems we face with centralized platforms like Twitter.

Unpacking the Problems - Twitter

There are a huge number of tweets per day. As far back as 2013 Twitter has reported 500 Million daily tweets. As far back as 2014, over 83% of all United Nation countries had a presence of Twitter, with the majority of heads of government on the platform. Twitter has become a source of information that could easily be considered a public good. But it’s not, and there are several problems to consider:

  • Developers are limited to retrieving only 1,200 tweets every hour, which basically would only allow someone else to build a platform containing about 0.005% or less of the information showed on the official website.

    • This prevents any kind of innovation or competition. For example if someone wanted to build a platform where users could pick their own transparent algorithm for their social media feeds or make tweaks to the ads they are served.

    • This is important for Twitter to be able to monetize its users, and allows them to monetize them as much as possible to the extent that they won’t stop using the platform completely. In fact, it’s common for these huge social media companies to have started off without monetizing their users, for example when YouTube didn’t have any ads on videos. Their challenge is capturing large user bases, with monetization being a non-issue once they have done so.

  • It’s really challenging for users to leave social media platforms because they have well established followings and connections, which would have to be started all over on a new platform.

  • Twitter owns all the data in their own centralized databases, which creates huge targets for hackers, and doesn’t create resiliency to the data being deleted. If Twitter made the business decision to delete all tweets older than one year they could freely do so.

  • There’s risk that comes from these business models. Twitter and other companies have a responsibility towards their investors but not their users. So it’s not ideal that they lose a ton of money, for example this past quarter they recently ended they had a loss of almost 350 Million dollars, which will never be good for the users and presents questions around what would happen if twitter as a company were to go bankrupt while sitting on sitting on billions of dollars worth of user data.

All of the problems outlined above are solved by up and coming decentralized social media protocols like Lens Protocol, and Orbis. These blockchain protocols provide a base layer that any developer can build on top of, without locking users into a particular platform. The social network of a user sits at the base blockchain layer rather than the platform they are utilizing. This means they can move between competitors as they please and maintain their connections, as well as leverage the same underlying social graph between platforms.

Lens Protocol Ecosystem

Today Lens Protocol already has a thriving ecosystem of different platforms and tooling:

Here are some examples of platforms which take the same underlying user social network/graph and present the information differently based on the purpose of the platform, and work well today.

“Decentralized Twitter”:

“Decentralized YouTube”:

“Decentralized LinkedIn”:

Sustainable Business Models

So what would decentralized social media need to get right in order to not only create better systems but also more sustainable business models that don’t work directly against their users?

There are several independent layers that need to function and be self-sustaining:

  • Where transactions are written, in a way which can’t be arbitrarily modified. Blockchains like Ethereum and Polygon perform this role, and they have sustainable models for verifying and storing accurate data âś…

  • Where the content itself is written. Blockchains like Ethereum aren’t good for storing things like videos, blog posts, social media posts, etc… so we need decentralized storage solutions. Today we have IPFS, as well as more permanent blockchain storage solutions like Arweave âś…

  • How the data can be accessed. Blockchain data can be difficult to access, and this is the problem The Graph solves âś…

  • Platforms which maintain websites to digest the data and make it available to users to interact with. This is the main aspect we’ll continue to see a lot of growth around.

How will these platforms offset operating costs and be profitable, but will leave all the power in the hands of the users under this new model?

Sustainable Social Media Platforms

It mostly still remains to be seen how developers will tackle the problem, but as an example, what would the Twitter/Facebook/YouTube advertising model look like with decentralized social media?

Users could decide to specifically opt-in to advertising. The users themselves could keep a large portion of the ad revenue and turn a profit on their own data, with the platform earning a small percentage. This model creates a lot more healthy competition than what we see today. Another platform could come along which keeps less of the ad revenue. Or another could come along which allows users to be more in control of the ads they receive. Or another one could differentiate itself with a different social media algorithm for its newsfeed, improved user experiences, or in some other way. As different platforms with different value propositions come along users can freely move to use different ones, or use multiple ones as they please.

This is just one way of funding a platform, and anyone can build their own solution. This creates a lot more innovation and choice for the users.

There aren’t big disadvantages from a user’s perspective. Users can retain their right to privacy where needed on things like private messages by storing the data on the blockchain in an encrypted way that only the correct users have access to. They may have to spend something like $0.0001 to make a post, or $0.3 for a 100MB video. Which are costs big platforms can easily offset, and the users themselves could easily offset if they wanted to monetize through advertising or other means. The benefits around privacy, control, and composability far outweigh these disadvantages.

Data Analysis

Because all the data for Lens Protocol lives on the Polygon blockchain, we can leverage The Graph to freely analyze the data without limitations.

We can for example look at the number of posts and comments over time across all different platforms that are leveraging the underlying Lens Protocol smart contracts:

Comment and Post counts through Oct 18
Comment and Post counts through Oct 18

There are a number of stats we could visualize, but one of the more interesting topics to take a deeper dive into is where the content itself is stored.

Storage Analysis

Blockchains like Ethereum and Polygon are not a good solution for storing content like photos, videos, social media posts, or full websites. Therefore the transaction will post a link to the content itself. The content is uploaded to decentralized storage solution, or in some cases to centralized storage, which is bad and defeats the purpose. This has been a big problem for NFTs in particular, where users own NFTs with content that becomes no longer accessible. One popular decentralized storage solution is IPFS, which allows anyone in the world to store that content wherever they’d like, and as long as one copy of the content is made available by someone it can be accessed. Posting to IPFS is free, but if the file isn’t stored or made available by anyone it becomes inaccessible. For that reason a better solution is to post the content to a blockchain which specializes in storage, like Arweave, where users can pay once to have the content stored forever.

Platforms on Lens Protocol can decide where to store the content. So where is content being stored?

Counts by Storage Medium through Oct 18
Counts by Storage Medium through Oct 18

The content for Lens Protocol is being stored across Arweave, IPFS, and centralized data storage (Phaver social media app). It’s also interesting how the Phaver app in particular seems to have a disproportionate number of posts versus to comments compared to other platforms like Lenster. My best hypothesis is it may be caused by the “Phaver points” that they introduced (which will eventually be directly tied to a token with value behind it), and that posts are a much better path for acquiring them.

So of the content which is available across the three storage mediums, what content is available, and also importantly, can be retrieved in less than half a second?

Percent of content retrieved within half a second - through Oct 18
Percent of content retrieved within half a second - through Oct 18

There are a couple aspects that make this interesting:

  • The Phaver API is centralized and extremely performant. However, almost 5% of the publications point to completely dead links. This is not necessarily a bad thing. Users have the choice of being on a platform where moderators can step in to delete illegal content, or delete their own content as they please (blockchain permanence may not always be optimal for social media). The data storage being centralized can also help the user experience and make websites more performant.

  • Arweave is an incredibly good blockchain based storage solution. I was unable to find any dead links in my research. The main issue was I couldn’t retrieve the data in under half a second as consistently as the Phaver API. This can be mitigated by platforms by caching the data and storing copies in their own systems. I personally believe this is the best solution between the three.

IPFS on the other hand wasn’t as consistent or performant. Accessing the data was quite slow in most cases, with some cases where I was unable to access it at all.

Explore the data below:

Sentiment Analysis

I was also personally curious about doing sentiment analysis using data on Lens Protocol. There has been research that has shown the Twitter/Facebook algorithms (which none of us can view or audit) tend to amplify user outrage in a number of different ways. It’s also worth noting that having users stay longer on their platforms has considerable monetary upside, so driving engagement is likely the goal of the algorithms. But what if you could create a highly customizable social media feed where users themselves can choose their algorithm, for example one which promotes the most positive content and/or engagement?

I used the Google Cloud Cloud Natural Language API for sentiment analysis as well as categorizing posts and comments by topic. I started by looking at the average sentiment by day across all posts. I only included posts that had at least 20 words (no “gm”), and those I could retrieve within 0.5 seconds. After exclusions, I analyzed 66,515 posts (through October 18th). Here a score of 1 would be extremely positive, and -1 extremely negative.

Posts Average Daily Sentiment through Oct 18
Posts Average Daily Sentiment through Oct 18

Overall the sentiment has been more positive than negative, and you’ll notice the color scale shown in the legend on the right side goes up to 0.4 but not much lower than 0.1, and there have been some pretty significant spikes in the average sentiment for a given day compared to the first couple of months.

For this next chart I looked at the predicted category a post was about, and we calculated the average overall sentiment for each category. Those that are colored lighter mean there were more posts about the topic.

Average Sentiment by Predicted Category
Average Sentiment by Predicted Category

Finance was the top topic users posted about, with Law & Government, News, and Finance standing out with negative sentiment. Compared to Online Communities, Sports, Arts & Entertainment, and Games showing more positive sentiment.

For this last chart, I searched for specific key words mentioned by the post to categorize the post by the topic, and calculated the average overall sentiment for each topic. Those that are colored lighter mean there were more posts about the topic.

Average Sentiment by Topic Mentioned
Average Sentiment by Topic Mentioned

Here we see the most negative sentiment being associated with posts relating to inflation and Russia, compared to Lens getting by far the best sentiment out of the topics we looked at. It’s also pretty interesting that Bitcoin and Crypto had negative sentiment, most likely due to the bear market, while Ethereum managed to have a bit better average sentiment around it.

Posts with sentiment analysis scores and category predictions:


GitHub repository with full reproducible code:

Ricky on Lens:

Ricky on Twitter:

Please reach out with any questions! I am also working on a new R programming tutorial which always ingests the latest Lens data when the tutorial is accessed, so stay posted for that 🙂

Subscribe to Ricky
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
This entry has been permanently stored onchain and signed by its creator.