
Global Data Pledge
A commitment to make accessible data available , and to make reciprocity real. Most of the high-value training data the world needs already exists. It sits in government archives, public broadcaster libraries, university collections,…
The Global Data Pledge is the contribution gateway for the readability layer. Institutions that pledge data and preferences into the commons enter the system as rights-holder asserters in the Provenance Registry. Each pledge is captured in AMPL vocabulary, made citable through a public live registry, and routed into the Commons Dataset where the pledged corpus satisfies the cleared criteria. Our campaign is currently targeting 20 institutional pledges by 7 July 2026 as the front door for institutional contribution to the entire stack.
We, the undersigned, commit to making the publicly accessible data our institutions hold available for AI training — and to doing so under terms that ensure the value our contribution creates flows back to the communities that produced it.
We recognize that our holdings — government records, public broadcasting archives, library collections, scientific publications, public-domain texts, openly licensed creative works — are part of the substrate from which AI is built. We commit to making them accessible through formats that enable lawful AI training, aligned with the AMPL vocabulary, and recorded in the AI Commons Registry so that downstream use is traceable, auditable, and reciprocal.
Two distinct asks. Both operational.
The pledge is built around two separable but related commitments. Institutions can sign one without the other; full signatories sign both.
The data winter is real. The thaw requires commitment.
Researchers have documented a decline in what's been called the AI Data Commons — the corpus of openly accessible material that public-interest, research, and open-model AI development depends on. Web scraping under contested legal conditions has produced a backlash; publishers are walling off content; opt-outs are accelerating. The current trajectory will leave open-data AI development structurally disadvantaged against the closed-data frontier.
The Data Pledge provides that framework. Availability without reciprocity is extraction. Reciprocity without availability is performance. Together, coordinated through AMPL and the registry, they are how a public-interest data commons gets built at the scale the moment requires.
Four kinds of institutions. Different asks.
The pledge is structured so that different institutions can commit at the level appropriate to their holdings and authority. Each tier has clear, operational obligations.
First signatories being assembled.
The pledge launches at AI for Good in July 2026. Conversations with founding signatories are active now. Institutions in development:
Sign. Convene. Co-shape.
The pledge is in formation. Founding signatories are being assembled now. The text is open for input from prospective signatories before the formal launch.