PROJECTS

Four operational surfaces. One common system.

The Pledge, the Registry, AMPL, and the Commons Dataset. Each is useful on its own. Together they form the public infrastructure for consent-aware, provenance-rich AI. Each is being developed openly with partners across the AI, library, archive, and policy communities.

Global Data Pledge

Global Data Pledge

Institutions pledge their data and preferences into the commons. Each pledge becomes a citable public commitment in AMPL vocabulary, anchored in the Registry. Active campaign targeting 20 institutional pledges by 7 July 2026 at the AI for Good Summit.

AI for Good · July 2026Launch

20 institutional pledgesTarget

AMPL-aligned, registry-anchoredVocabulary

Active campaignStatus

Provenance Registry

Provenance Registry

The Provenance Registry is a machine-readable record of the status, provenance, and consent terms attached to works in the open web. Two layers are operational: a Publisher Registry where institutions declare their position on AI training, and an Assertion Registry where custodial attestations are recorded.

Two, operationalLayers

Growing listPublishers

Validated, queryableSchema

Active developmentStatus

AMPL is an OSI-compatible base license for AI models and datasets, with four optional modules that handle the obligations that software licenses don't cover. Currently at v0.9 public draft, in legal review with the inner core.

v0.9 public draftVersion

OSI-compatibleBase

4 optional, composableModules

Under inner-core legal reviewStatus

Commons Dataset

Commons Dataset

The Commons Dataset brings together existing open training corpora into a single AMPL-aligned, registry-anchored federation. AIC contributes the licensing infrastructure, the provenance coverage, and the coordination between corpus teams.

Multi-billion tokens, AMPL alignment in progressCoverage

AMPL v0.9, vocabulary dimensions pending finalizationLicensing

Coordinated by AIC, governed collaborativelyFederation

Alignment phaseStatus

WHY THESE FOUR

Why these four work together.

Each project is useful on its own, and each gets stronger when combined. The Pledge is the contribution gateway — the on-ramp through which institutions commit data and preferences into the commons. The Registry is the substrate where those commitments and the assertions about every artifact become citable. AMPL is the shared vocabulary that runs through every surface, so a pledge, an attestation, and a downstream use all speak the same language. The Commons Dataset is what becomes available to train on once the first three are in place.

For an institution preparing to comply with the EU AI Act, or for a research team that wants to build on data with clear provenance, the four together give a starting point. Adoption is open. Co-authorship of the next versions is invited.