Let’s be frank, the data economy has been growing as never before, but we still lack digital marketplaces for datasets. If you go to Google and search for websites to buy and sell data you’ll find very few options. There’s Data and Sons which is probably the main one for both sellers and buyers, but the rest are mainly buying options. And then there are free platforms such as Kaggle. This can be frustrating for individuals and small companies willing to sell their data.
Fortunately, I was able to find a platform in the web3 space, that allows me to easily submit datasets for sale. These assets can be acquired with the platform’s cryptocurrency token, which can be a blocking point for people not so familiar with this technology. However, don’t quit just now, you’ll see that is not so complicated to buy and sell assets on this platform and probably easier than more conventional solutions.
In this article, I’ll cover the steps to upload datasets to this crypto marketplace. We’ll first take a look at a manual way of doing it, and then we’ll use a Python library to seamlessly submit datasets for sale on the same platform.
⚠️ NOTE: To fully understand the steps, you should have at least a crypto wallet. For the developer part, it’s good to have a basic understanding of Python programming language.
Decentralized storage solutions work very similarly to what torrents do, data gets stored in several computers to ensure decentralization, security and permanency. We’ll cover two main solutions.
IPFS was designed to create a decentralized network of interconnected nodes, where the content is distributed along these nodes.
While there are techy ways of using IPFS, with JavaScript, Rust and more, the most user-friendly approach is by using IPFS desktop.
With this, you can simply click on Import
and upload your file/dataset. It will automatically generate a CID code for your file. This code can be used to retrieve the information from IPFS and also to fetch the data in the marketplace.
This is my preferred solution because it is much faster and uses a blockchain-like storage architecture. Arweave stands as a permanent data storage platform, and anyone can use it to upload data in the permaweb. Unlike IPFS, with Arweave there are several gateways to store the data. The simplest one I found so far is through Akord because the experience is very similar to the cloud providers we are used to in web2, such as Google Drive and Onedrive. You can create your account here.
You pay for the amount you want, there’s no need for monthly or annual subscriptions. In addition, Akord accepts payments in USD, so you don’t need to be a crypto guy to use it.
The platform is organized in vaults, you can make them public or encrypted.
After uploading a file to your vault, you’ll see three dots next to it. By clicking on it, you’ll see the URLs from where you can retrieve your data without the Akord application.
In both URLs you’ll see a code like the following:
uS5rgQ5simDwCHy8NdJ...
This represents your transaction ID, and just like the CID from IPFS, we’ll be using it to set our datasets for sale.
👉 Create your Akord account with my referral link.
If you don’t have an Ethereum cryptocurrency wallet yet, this is the time to do it. You can easily create one with Metamask Wallet. Then I recommend you to have some MATIC in the Polygon Chain, to pay for gas fees. This is important because gas fees are extremely high in the Ethereum main net, and the platform allows you to set data for sale in several L2 solutions, such as Polygon and Optimism.
If you don’t have MATIC in the Polygon Chain, you have two options, either you buy directly on Metamask, or you must bridge your ETH to the Polygon Chain and then swap for MATIC. I did the second, with the Polygon Portal.
This could seem complicated, but once you have some MATIC to pay the gas fees in Polygon, you won’t need to do this step again so soon. The same happens if you want to buy a dataset.
Once you have uploaded the dataset to one of the decentralized storage solutions, and have your wallet with some MATIC on Polygon to pay for gas fees, we can fetch the file through its ID and sell it on the platform.
We are now ready to use the marketplace, which is the Ocean Market, by the Ocean Protocol.
The Ocean Protocoll is a web3 protocol that allows businesses and individuals to exchange and monetize data through a rich stack of tools for Data Scientists, such as the following:
Ocean.py: a Python package that allows users to publish data, compute datasets through Machine Learning algorithms, transfer data assets and much more.
Data Challenges: every two to three weeks, the Ocean Protocol launches Data Science challenges, with rewards for the participants in the form of crypto assets.
Predictoor: a decentralized crypto bot, where Data Scientists can submit their predictions and stake tokens on them. While traders can use Predictoor to trade, like any other trading bot.
Ocean Market: in this marketplace, users can sell and buy datasets, algorithms, reports and more.
Let’s now take a deeper look at the Ocean Market and set our dataset for sale.
When you enter the Ocean Market’s website, you see on the right top corner an option to connect your wallet, you can do it with Metamask and Wallet Connect, but make sure you have the Polygon network selected. Once connected, you’ll see something similar to the image above, then click Publish
on the left top corner. You should get something like this:
Make sure you have Polygon
at the top. Fill all the cases and press CONTINUE
.
As you can see above, there are several options to fetch the file, Arweave and IPFS are among them. For the first, you copy-paste the transaction ID and for the second you use the CID (both are explained in step 1). You can also upload a sample file from Onedrive and other cloud providers.
It is now time to publish your asset for sale, you can specify a price or sell it for free. Once done, you click on CONTINUE
and you’re ready to submit it. During the submission phase, it will ask you to sign two transactions that will cost you some gas fees.
In the end, you should get a data asset that follows the same structure as this.
On the top, you’ll see the option Profile
next to Publish
. In this section, you can keep track of all your data assets, edit them and share them.
Now we’ll take a look at how to do it in an automated way, by using the Ocean.py Python library.
The Ocean.py Python package is an Ocean Protocol toolset for developers. It allows one to perform several actions within the protocol, such as uploading and computing data through Machine Learning algorithms. In this piece, we will only take a look at how to submit and set data assets for sale with this library.
Before start using the library, we need to create an account on Infura to make a JSON-RPC connection to an Ethereum node. If you have your node you can use it instead.
Once you’re connected, you need to create a new API key.
Before implementing the functions and the script, we need to install the necessary libraries:
pip install ocean-lib
pip install eth-account
You can learn more about the Ocean.py
package in the following GitHub repository:
Let’s start by making a function to wrap our Ethereum connection to the Ocean
class and return two ocean objects.
from ocean_lib.example_config import get_config_dict
from ocean_lib.ocean.ocean import Ocean
from ocean_lib.ocean.util import to_wei
# Configuration for Polygon - Mumbai
def make_config(token, net='mumbai'):
networks = {
'mumbai': "https://polygon-mumbai.infura.io/v3/",
'polygon': " https://polygon-mainnet.infura.io/v3/"
}
config = get_config_dict(f"{networks[net]}{token}")
ocean = Ocean(config)
ocean_obj = ocean.OCEAN_token
return ocean_obj, ocean_tkn
The function takes two inputs a token
, which is the Infura token we’ve just created, and the network. By default, it takes the mumbai
testnet, but you can switch to Polygon.
The output ocean_tkn
is used for actions concerning the Ocean Protocol token ($OCEAN), while the ocean_obj
concerns general actions within the protocol.
Let’s now use the ocean_obj
to create an asset in the marketplace, with information such as title, description, author and tags.
# Create data_NFT, datatoken and ddo
def create_asset_from_arweave(
ocean_obj,
title,
publish_address,
arweave_id,
author, description, tags):
# create metadata
metadata = ocean_obj.assets.__class__.default_metadata(
title, {"from": publish_address})
metadata.update({
'description': description,
'author': author,
'tags': tags,
})
# create arweave asset
data_nft, datatoken, ddo = ocean_obj.assets.create_arweave_asset(
name, arweave_id, {"from": publish_address}, metadata=metadata)
return data_nft, datatoken, ddo
In the function above, we fetch the file from Arweave, so we use the transaction ID for the arweave_id
input. The publish_address
is the Ethereum wallet that identifies the publisher and pays for the gas fees. The outputs are a Data NFT, a datatoken and a DDO. We’ll be mainly using the datatoken.
With the previous function, we make sure to have the asset in the marketplace, but it is still not for sale, for that we need the following:
def post_for_sale(
ocean_tkn,
datatoken,
publish_address,
price,
n_datatokens):
exchange = datatoken.create_exchange(
{"from": publish_address}, to_wei(price), ocean_obj.address)
datatoken.mint(
publish_address, to_wei(n_datatokens), {"from": publish_address})
datatoken.approve(
exchange.address, to_wei(n_datatokens), {"from": publish_address})
return exchange
If you’re using the Mumbai network, you may need some faucet tokens in your wallet. You can request MATIC here.
It is now time to observe the complete submission flow, using all the functions mentioned.
import os
import dotenv
from eth_account.account import Account
from upload_utils import make_config, create_asset_from_arweave, post_for_sale
PATH = os.path.dirname(os.path.dirname(__file__))
dotenv.load_dotenv(f"{PATH}/keys.env")
# VARIABLES OF DATASET
TITLE = "Post3 Dataset | Mirror Entries from Week 472023"
AUTHOR = "Post3"
DESCRIPTION = "<my_description>"
TAGS = ['dataset', 'data-nft', 'data-analysis', 'data-mining']
# create ocean objects
ocean, ocean_obj = make_config(
os.getenv('INFURA_TOKEN'),
'mumbai')
# call address
my_private_key = os.getenv('REMOTE_TEST_PRIVATE_KEY1')
my_address = Account.from_key(private_key=my_private_key)
if __name__ == '__main__':
# create dataset
data_nft, datatoken, ddo = create_asset_from_arweave(
ocean,
name=TITLE,
publish_address=my_address,
arweave_id=os.getenv('ARWEAVE_ID'),
author=AUTHOR,
description=DESCRIPTION,
tags=TAGS)
# post for a price
post_for_sale(
ocean_obj,
datatoken,
publish_address=my_address,
price=35,
n_datatokens=10)
This script uses another package, the eth_account
, which is utilized to create wallets or get Python objects based on existing Ethereum wallets. In this case, we use it to call the publishing address.
By running this function I got the following asset in the Ocean Marketplace:
You can find more assets that I’ve published using Ocean.py
and manually here.
The Ocean Marketplace serves as a new and innovative way of data economy. The setup can be difficult for people not so familiar with cryptocurrencies, and might not reach a wider market for the time being, since it is still very focused in the web3 realm. However, it can be a great platform when you're targeting a public familiar with the crypto economy.
I decided to explore the Ocean.py package because I wanted to automate the process of submitting datasets for sale. I integrated the functions presented in this article within a web application to upload datasets and return insightful dashboards. If you’re curious about the project you can learn more in the following article:
I encourage you to explore other functionalities of the Ocean.py package and try out the rest of the Ocean Protocol toolset since there’s a significant increase in adoption for projects merging Data Science and AI with Blockchain. Consider exploring other projects in the field such as Bittensor and Commune AI.