Model Ownership: Vana's Decentralized Data Co-operative
Автор: SCB 10X
Загружено: 2025-06-18
Просмотров: 14842
Описание:
1/ Vana is building infrastructure for users to collectively own their data and create user-owned AI models. With over 1 million users contributing data across 16 operational data DAOs, they're targeting 100 million users to aggregate 450 trillion tokens - 30x larger than any single AI company's dataset.
2/ The data wall problem is real: AI companies have exhausted the public internet's ~15 trillion tokens. But this represents only 0.1-4% of total internet data. Most data sits behind walled gardens - your messages, documents, and interactions that never reach the public web.
3/ Users legally own their data, similar to a car in a parking lot. Platforms don't own your Facebook posts or Reddit comments - they merely have permission to operate their service. GDPR, CCPA, and Utah's Digital Choice Act reinforce these rights, enabling data portability within 30 days.
4/ Vana's Reddit Data DAO achieved 140,000 users in its first week, forcing Reddit to change data policies. Top users earned $300-400 for their data. This validated that users will overcome friction (including setting up crypto wallets) when they realize their data's value.
5/ Cross-platform data is 100x more valuable than single-source data. Spotify data alone costs $0.30 on open markets, but combined with fashion and demographic data reaches $25. Only users can aggregate this cross-platform data - Spotify can't sell fashion data they don't have.
6/ Vana built a custom L1 blockchain instead of using existing infrastructure due to regulatory requirements. Centralized solutions (like L2s with central sequencers) would be regulated as data processors under GDPR, requiring impossible features like on-chain data deletion.
7/ The dual validator system uses traditional L1 validators plus data validators running in trusted execution environments. This enables programmable privacy - users can grant specific operations on their data while maintaining cryptographic control through "non-custodial data."
8/ Proof of contribution scores data quality using metrics like account age, data volume, and LLM quality checks. Data is then structured in SQL format for granular access control. Post-processing ensures proper demographic representation and addresses over/undersampling issues.
9/ Vana partnered with Flower Labs to build COLLECTIVE-1, the world's first user-owned foundation model (7B parameters). This marks a milestone - 18 months ago, distributed pre-training was considered impossible. The model trained on text data from across Vana's data DAOs.
10/ The dual token system features VANA as the native asset plus data-specific tokens (VRC-20 standard). Top data DAOs earn rewards for onboarding useful data, measured by access frequency and contributor count. This tokenizes data as a tradeable asset class.
11/ Private data faces unique challenges: some model architectures allow reverse engineering training data from weights. Data DAOs must approve code running on their data, currently a manual process that will become more automated while balancing privacy guarantees with performance needs.
12/ Big tech companies often can't train on their own data due to ToS restrictions and PR concerns. Vana's co-founder previously sold data to tech giants who were buying messaging platform data they technically operated but couldn't legally access for training.
13/ Key expansion areas include robotics (severely data-constrained), healthcare (regulatory barriers limit training data), and bio research (needs ground truth data beyond publications). Major platforms like Google Docs, Discord, and Twitch still lack data DAOs.
14/ Vana Academy launched to train builders on creating data businesses. The ecosystem of full-time data DAO builders now exceeds the size of Open Data Labs plus the Vana Foundation combined, with specialists from quantum research to enterprise sales joining.
15/ For builders: This is an asymmetric bet opportunity in the AI/crypto frontier. Start a data DAO, access unique datasets for model training, or build DeFi applications using data as collateral. The infrastructure enables entirely new interaction patterns with private data.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: