Skip to content
All Projects
Project

Global Data Pledge

A commitment to make accessible data available , and to make reciprocity real. Most of the high-value training data the world needs already exists. It sits in government archives, public broadcaster libraries, university collections,…

Launch
AI for Good · July 2026
Two asks
Availability + Reciprocity
Signatories sought
Governments · Publishers · Archives
Status
Commitments opening

The Global Data Pledge is the contribution gateway for the readability layer. Institutions that pledge data and preferences into the commons enter the system as rights-holder asserters in the Provenance Registry. Each pledge is captured in AMPL vocabulary, made citable through a public live registry, and routed into the Commons Dataset where the pledged corpus satisfies the cleared criteria. Our campaign is currently targeting 20 institutional pledges by 7 July 2026 as the front door for institutional contribution to the entire stack.

The Pledge

We, the undersigned, commit to making the publicly accessible data our institutions hold available for AI training — and to doing so under terms that ensure the value our contribution creates flows back to the communities that produced it.

We recognize that our holdings — government records, public broadcasting archives, library collections, scientific publications, public-domain texts, openly licensed creative works — are part of the substrate from which AI is built. We commit to making them accessible through formats that enable lawful AI training, aligned with the AMPL vocabulary, and recorded in the AI Commons Registry so that downstream use is traceable, auditable, and reciprocal.

What the pledge actually commits

Two distinct asks. Both operational.

The pledge is built around two separable but related commitments. Institutions can sign one without the other; full signatories sign both.

Ask 01 · Availability
Make accessible data available for AI training.
Publish your holdings — government data, public broadcaster archives, library collections, scientific publications, public-domain texts — in formats that enable lawful AI training. Document provenance. Provide bulk-access methods that don't require scraping. Where contractual or copyright constraints apply, make the constraints machine-readable.
Ask 02 · Reciprocity
Ensure contribution flows back to source communities.
Align your contributions with AMPL, the shared vocabulary that lets reciprocity obligations travel downstream. Record your contributions in the AIC Registry so they are auditable. Where appropriate, attach use-case dimensions that protect source communities from extraction. Contribution is a decision, not an extraction.
Why now

The data winter is real. The thaw requires commitment.

Researchers have documented a decline in what's been called the AI Data Commons — the corpus of openly accessible material that public-interest, research, and open-model AI development depends on. Web scraping under contested legal conditions has produced a backlash; publishers are walling off content; opt-outs are accelerating. The current trajectory will leave open-data AI development structurally disadvantaged against the closed-data frontier.

The data winter is not an inevitability. It is a coordination failure. Most of the institutions holding high-value data want their work to power public-interest AI. What's missing is the framework that lets them contribute without contributing to extraction.

The Data Pledge provides that framework. Availability without reciprocity is extraction. Reciprocity without availability is performance. Together, coordinated through AMPL and the registry, they are how a public-interest data commons gets built at the scale the moment requires.

Who can sign

Four kinds of institutions. Different asks.

The pledge is structured so that different institutions can commit at the level appropriate to their holdings and authority. Each tier has clear, operational obligations.

01 · Governments
Make public data AI-accessible.
National statistical offices, government data portals, public records, public broadcasters. Commit to bulk access, AMPL alignment, and registry coverage. Lead by example in the data your taxpayers already own.
02 · Memory institutions
Open the archives.
Libraries, archives, museums, universities. Public-domain holdings, openly-licensed material, institutional collections. Stewards of human knowledge committing to make that knowledge usable for AI in the public interest.
03 · Publishers
Contribute the consented corpus.
Where consent exists or can be obtained — open access journals, public-domain catalogs, freely-licensed content — publishers can contribute substantive holdings without contradicting their commercial models. The pledge defines the boundary.
04 · Rightsholder collectives
Establish reciprocity at the source.
Collective management organizations, creator unions, content collectives — represent the rightsholders whose contributions need protection. The pledge gives collective voice in defining what reciprocity actually means.
In conversation now

First signatories being assembled.

The pledge launches at AI for Good in July 2026. Conversations with founding signatories are active now. Institutions in development:

PleIAs
Common Corpus steward · founding corpus partner
In conversation
Internet Archive
Wayback Machine · public-domain holdings
Founding attestation signed
EleutherAI
Common Pile · openly-licensed corpus
In conversation
Harvard IDI
Institutional Books · gated-access commons
Methodology aligned
Wikimedia
Wikipedia and sister projects
Working session opening
Public broadcasters
Public AI Network coordination
Network-level conversation
How to engage

Sign. Convene. Co-shape.

The pledge is in formation. Founding signatories are being assembled now. The text is open for input from prospective signatories before the formal launch.

If you can sign
Become a founding signatory.
Founding signatories are being announced at AI for Good 2026. The threshold for founding status is meaningful commitment in both availability and reciprocity. Contact us to begin the conversation.
If you can convene
Host a pledge convening.
Regional convenings, sector-specific working groups, jurisdictional adaptations — the pledge needs local stewards. If you can bring an audience to the table, we provide the framework, briefing material, and follow-up coordination.
If you can shape
Review the text.
Legal counsel, policy advisors, community advocates — the pledge text is in active drafting. We need input from people who can identify edge cases, jurisdictional concerns, and reciprocity gaps before the public launch.
If you can fund
Underwrite the campaign.
Funder support enables the convenings, the policy briefings, the coordination work that turns a pledge into an actual movement. Contact funding@aicommons.org for the campaign brief.