BETAYou're testing the beta version▼

Aiua

🏠 Home ✍️ Contribute 📡 Community Feed 📖 My Journal 🏆 Leaderboard 🗄️ Dataset ℹ️ About ❤️ Support

Open DatasetCC0 1.0 UniversalCardano-anchoredArweave-permanent

The most intentional human values dataset ever assembled.

High-quality, provenance-verified, values-aligned, community-owned.

Explore Dataset →

Contributionsⓘ

Active Contributorsⓘ

Avg score / 200

Weekly anchors

What is this dataset?

Aiua™ is a daily guided self-reflection platform where humans respond to prompts scaling from easy life questions to complex moral dilemmas to AI-personalized Socratic inquiry. Every response is scored across 12 value dimensions by Claude (Anthropic) and published to this open dataset under CC0.

Unlike scraped web data, every contribution here was deliberately created. Contributors can go deeper with up to three rounds of Socratic follow-up, and judge anonymous preference pairs — creating multi-turn dialogue and RLHF/DPO training data that doesn't exist anywhere else.

The dataset is designed for reward modeling, instruction fine-tuning, and values classification research. The free export includes all scores, prompts, and provenance. A paid API adds normalized scores, quality signals, demographic cross-references, full Socratic dialogue transcripts, and preference pair judgments with timing data.

12-Dimension Scoring Framework · 250 points total

Life

20 pts

Sanctity of living things

Liberty

20 pts

Freedom and autonomy

Kinship

20 pts

Connection and belonging

Ecology

20 pts

Care for the natural world

Legacy

20 pts

Long-term thinking

Truth

15 pts

Epistemic honesty

Justice

15 pts

Fairness and equity

Wisdom

10 pts

Insight from experience

Perspective

10 pts

Nuance and viewpoints

Humility

10 pts

Epistemic humility

Authenticity

40 pts

Genuine self-expression

Depth

40 pts

Reflective richness

Why it matters

This dataset exists because AI alignment needs better training data. Most alignment datasets are preference pairs from crowd workers — shallow, narrow, and inauthentic. Aiua captures what real people actually believe, value, and experience in their own words, scored against a consistent framework rooted in cross-cultural moral philosophy. Use it to fine-tune AI systems that understand human values like wisdom, compassion, and empathy — not just preferences.

✍️

Intentional, not scraped

Every contribution was written in response to a personalized prompt designed to surface genuine values. Prompts scale from easy life questions to moral dilemmas to AI-personalized Socratic inquiry. No data was scraped without consent.

📊

Scored + multi-turn

Each contribution is scored across 12 dimensions (250-point rubric) AND optionally followed by 3 rounds of Socratic dialogue. Preference pairs provide direct human judgments. This gives researchers scores, dialogue, AND preferences — not just text.

⛓

Permanently verifiable + encrypted

Weekly Merkle roots anchored to Cardano. Full dataset on Arweave. Premium fields encrypted with AES-256-GCM; keys escrowed on-chain. A dead man's switch publishes all keys if the platform becomes inactive. The data survives independently.

🔒

Privacy-preserving by design

All contributions are AI-sanitized to remove personally identifiable information. Voice recordings are never stored. Each weekly export includes world context metadata — top headlines and current events — so future researchers understand when and why these reflections were written.

Dataset Structure

Optimized for Machine Learning & Fine-Tuning. One record per contribution.

{
  "id": "f7a3c2e1-...",
  "prompt": "Is it ever right to lie to protect someone you love?",
  "text": "I found myself in [a hospital in the Pacific Northwest]...",
  "language": "en",
  "created_at": "2026-03-14T09: 23: 11Z",
  "voice_used": true,
  "prompt_difficulty": "medium",
  "prompt_source": "cached",
  "scores": {
    "total": 178,
    "life": 19, "liberty": 16, "kinship": 18, "ecology": 17,
    "legacy": 15, "truth": 13, "justice": 12, "wisdom": 9,
    "perspective": 8, "humility": 8, "authenticity": 19, "depth": 17
  },
  "scoring_model": "rubric-v2.0",
  "depth_rounds": 2,
  "preference_stats": { "times_compared": 14, "win_rate": 0.71 },
  "provenance": {
    "content_hash": "sha256:a3f2...",
    "merkle_root": "b7c9e1...",
    "cardano_tx": "tx_abc123..."
  },
  "premium": "ENC:AES256GCM:iv:tag:encrypted_blob..."
}

FieldTypeDescription
idUUIDUnique contribution identifier
promptstringThe reflection prompt shown to the contributor
textstringThe contributor's response, sanitized — identifying details replaced with [bracketed generalizations]
languagestringISO 639-1 language code (auto-detected)
voice_usedbooleanWhether the response was spoken and transcribed
prompt_difficultystring"easy" · "medium" (moral dilemmas) · "hard" (philosophical) · "deep" (AI-personalized)
prompt_sourcestring"cached" · "ai_generated" · "ai_personalized"
scoresobjectRaw integer scores for each of the 12 dimensions + total (250 max)
scoring_modelstringVersion of the scoring rubric used (e.g. "rubric-v2.0")
depth_roundsintegerNumber of Go Deeper Socratic follow-up rounds completed (0-3)
preference_statsobjectAggregate: times_compared and win_rate from preference pair judgments
provenanceobjectSHA-256 content hash, Merkle root, and Cardano transaction ID
premiumstringEncrypted blob (AES-256-GCM) containing premium fields — decryptable with era master key
created_atISO 8601UTC timestamp of contribution submission

Field	Type	Description
id	UUID	Unique contribution identifier
prompt	string	The reflection prompt shown to the contributor
text	string	The contributor's response, sanitized — identifying details replaced with [bracketed generalizations]
language	string	ISO 639-1 language code (auto-detected)
voice_used	boolean	Whether the response was spoken and transcribed
prompt_difficulty	string	"easy" · "medium" (moral dilemmas) · "hard" (philosophical) · "deep" (AI-personalized)
prompt_source	string	"cached" · "ai_generated" · "ai_personalized"
scores	object	Raw integer scores for each of the 12 dimensions + total (250 max)
scoring_model	string	Version of the scoring rubric used (e.g. "rubric-v2.0")
depth_rounds	integer	Number of Go Deeper Socratic follow-up rounds completed (0-3)
preference_stats	object	Aggregate: times_compared and win_rate from preference pair judgments
provenance	object	SHA-256 content hash, Merkle root, and Cardano transaction ID
premium	string	Encrypted blob (AES-256-GCM) containing premium fields — decryptable with era master key
created_at	ISO 8601	UTC timestamp of contribution submission

Python quick-start

from datasets import load_dataset

# Load full dataset
ds = load_dataset("AiuaEarth/AiuaArchive")

# Filter by quality
high = ds["train"].filter(lambda x: x["scores"]["total"] >= 150)

# Filter by difficulty level
dilemmas = ds["train"].filter(lambda x: x["prompt_difficulty"] == "medium")

# Multi-turn only (had Socratic follow-up)
deep = ds["train"].filter(lambda x: x["depth_rounds"] > 0)

# Most-compared contributions (preference game)
compared = ds["train"].filter(
    lambda x: x.get("preference_stats") and x["preference_stats"]["times_compared"] >= 5
)

Dataset Growth

Logarithmic milestone timeline — matching phase trigger thresholds.

1K
1K

⭐ 10K
10K

⭐ 25K
Research

⭐ 100K
Revenue

⭐ 250K
DAO

500K
Decentralized

0 contributions · next milestone: 1K

Phase 1Active

Now

Prompt difficulty + moral dilemmas · Go Deeper Socratic dialogue · Preference pairs game · Tiered API · Encrypted exports + dead man's switch · Weekly context + RUBRIC.md

Phase 2Upcoming

25,000 contributions + 1,000 unique contributors

Grant applications · First academic partnerships · HuggingFace weekly updates · Card and crypto donations

Phase 3Future

100,000 contributions + 5,000 unique contributors

Paid API ($500/$10K/$50K+/yr) · Revenue to treasury · Midnight DID integration (ZK humanness + demographics)

Phase 4Future

500,000 contributions OR $500K treasury

Governance tokens · Midnight shielded contributions + ZK aggregate queries · Shielded preference pairs · On-chain Plutus dead man's switch · Full DAO

Phase 5Vision

500,000 contributions OR $500K treasury

Full on-chain DAO · Smart contracts on Cardano · Founding authority dissolved · Token-governed treasury

Permanent On-chain Storage and Provenance

Three independent layers. Verifiable by anyone.

⬡

Arweave

Permanent dataset storage

Full weekly JSONL exports with public fields in plaintext and premium fields encrypted (AES-256-GCM). Pay-once permanent storage. Includes RUBRIC.md, weekly world context, and resonance distributions. A dead man's switch auto-publishes decryption keys if the platform becomes inactive.

{ transactions(tags: [
  { name: "App-Name",
    values: ["Aiua-AI"] }
]) {
  edges { node { id block { timestamp } } }
}}

₳

Cardano

Weekly Merkle anchoring

Each week, a Merkle root of all contribution hashes is posted to Cardano mainnet as transaction metadata. Encryption master keys are escrowed on-chain. Weekly context metadata anchors the data in world events for future researchers.

Latest Anchor: Pending (activates at Alpha launch)

🤗

Hugging Face

Research discovery

Full dataset published weekly to Hugging Face Hub. Load in one line of Python. Versioned with full commit history. YAML frontmatter enables automatic indexing and citation. Common Crawl scrapes HuggingFace — the dataset will appear in future web crawls.

View dataset on HuggingFace →

Verify any contribution

Every contribution in the public dataset includes a SHA-256 content hash. To verify provenance: find the contribution ID, compute the hash of the sanitized content, and verify it exists in the corresponding weekly Merkle root anchored on Cardano.

CC0

1.0 Universal Public Domain Dedication

What this means

No rights reserved. Anyone may use, copy, modify, distribute, or build upon this dataset for any purpose — including commercial purposes — without asking permission or giving credit.

What you can do

✓ Train commercial AI models

✓ Publish research papers

✓ Build products and services

✓ Modify and redistribute

✓ Use without attribution

✓ Use without notification

The spirit

We believe AI training data should belong to humanity. The contributors who built this dataset chose CC0 deliberately — they want their values encoded into the AI systems that will shape the future.

Citation

@dataset{aiua2025,
  title={The Aiua Archive},
  author={Aiua Community},
  year={2025},
  url={https://huggingface.co/AiuaEarth/AiuaArchive},
  license={CC0-1.0}
}

Research Partnerships

Early research partners receive:

• Direct API access before public launch

• Custom dataset exports by dimension, tier, or date range

• Access to the AI_HUMAN_DELTA subset — contributions where AI and human audit scores diverged by 20+ points

• Co-authorship acknowledgment in dataset releases

• Priority access to future dataset versions

Name *

Institution

Email *

Research focus

Brief description

Aiua · Open Dataset · CC0 1.0 Universal · aiuaai@proton.me