Skip to content
PROJECTS

Four operational surfaces. One common system.

The Pledge, the Registry, AMPL, and the Commons Dataset. Each is useful on its own. Together they form the public infrastructure for consent-aware, provenance-rich AI. Each is being developed openly with partners across the AI, library, archive, and policy communities.

01
Global Data Pledge
Global Data Pledge
Institutions pledge their data and preferences into the commons. Each pledge becomes a citable public commitment in AMPL vocabulary, anchored in the Registry. Active campaign targeting 20 institutional pledges by 7 July 2026 at the AI for Good Summit.
AI for Good · July 2026Launch
20 institutional pledgesTarget
AMPL-aligned, registry-anchoredVocabulary
Active campaignStatus
02
Provenance Registry
Provenance Registry
The Provenance Registry is a machine-readable record of the status, provenance, and consent terms attached to works in the open web. Two layers are operational: a Publisher Registry where institutions declare their position on AI training, and an Assertion Registry where custodial attestations are recorded.
Two, operationalLayers
Growing listPublishers
Validated, queryableSchema
Active developmentStatus
03
AMPL
AMPL
AMPL is an OSI-compatible base license for AI models and datasets, with four optional modules that handle the obligations that software licenses don't cover. Currently at v0.9 public draft, in legal review with the inner core.
v0.9 public draftVersion
OSI-compatibleBase
4 optional, composableModules
Under inner-core legal reviewStatus
04
Commons Dataset
Commons Dataset
The Commons Dataset brings together existing open training corpora into a single AMPL-aligned, registry-anchored federation. AIC contributes the licensing infrastructure, the provenance coverage, and the coordination between corpus teams.
Multi-billion tokens, AMPL alignment in progressCoverage
AMPL v0.9, vocabulary dimensions pending finalizationLicensing
Coordinated by AIC, governed collaborativelyFederation
Alignment phaseStatus
WHY THESE FOUR

Why these four work together.

Each project is useful on its own, and each gets stronger when combined. The Pledge is the contribution gateway — the on-ramp through which institutions commit data and preferences into the commons. The Registry is the substrate where those commitments and the assertions about every artifact become citable. AMPL is the shared vocabulary that runs through every surface, so a pledge, an attestation, and a downstream use all speak the same language. The Commons Dataset is what becomes available to train on once the first three are in place.

For an institution preparing to comply with the EU AI Act, or for a research team that wants to build on data with clear provenance, the four together give a starting point. Adoption is open. Co-authorship of the next versions is invited.