scaling knowledge extraction from closed-source models on Commune

Synthetic data has emerged as a crucial component in the training of advanced models. Major AI labs have long recognized its potential and are actively utilizing it to train their models.

However, access to such data remains limited for the broader Open-Source community. The Synthia subnet aims to change that.

Synthia leverages the Commune protocol to create a permissionless mining market around extracting knowledge out of closed-source model APIs into a public dataset to accelerate the OpenSource AI space. We will adapt targeted models and strategy based on the present state-of-the-art.

The subnets output is a continuously growing aggregate of validated synthetic explanations of subjects picked from the Claude Opus latent space based on varying esotericity in a vast general list of technical and scientific fields. The explanations are varying in their target audience, level of detail and abstraction while incentivized to target Claude3-grade quality.

example miner output
example miner output

Validated miner outputs above a quality threshold are automatically uploaded to the public Synthia dataset on Huggingface.

We also have a Huggingface Leaderboard of miners and their rewards.

You can start mining or validating by following our docs.

Synthia aims to create the largest reliably high-quality synthetic knowledge dataset in the world and make it a public good that will serve as a catalyst for innovation in the Open-Source AI space. It will be a foundational resource available to Commune to build further data processing mechanisms such as a finetuning subnet on top.

We are excited to expand the breadth of synthetic data generation markets beyond explanations, building ontop of this foundation.

onwards 🫡

Subscribe to CommuneX
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.