A collection of open data sets for all sorts of interesting things.
General Data Sets
UCSF Industry Documents
Massive digital archive of documents created by industries which influence public health, such as the tobacco, chemical, drug and fossil fuel industries.
A searchable database that collects and analyzes legal complaints and requests for removal of online materials (DMCA take-downs), helping Internet users to know their rights and understand the law.
Large archive of sociological surveys conducted in African countries over the last ~20 years.
Large archive of sociological surveys conducted in the Arab countries of Africa and the Middle East from 2007 to 2021.
A censorship measurement platform that collects data using multiple remote measurement techniques in more than 200 countries. Provides reports and offers their raw data sets which are available for download.
Massive dataset of over 600 million domains. Total size is ~16 GB.
Face Recognition Datasets
A large collection of face datasets for training facial recognition systems and other things of that nature.
Offers over 50,000 public datasets for all kinds of various things.
An open repository of web crawl data that can be accessed and analyzed by anyone.
CORE Research Paper Database
CORE currently contains 207,255,818 searchable open access articles and research papers collected from 10,286 data providers around the world, which you can search using keywords.
An international, collaborative research project aimed at aggregating the collective work of independent researchers around the globe who wish to defend the public’s right to access information.
Government Data Sets
The US FOIA library. Contains over 6,700 scanned FOIA documents.
CIA Reading Room
Search and view documents released through the FOIA and other CIA release programs.
US National Archives
An independent agency of the United States government charged with the preservation and documentation of government and historical records. It is also tasked with increasing public access to those documents which make up the National Archive.
UK National Archives
One of the world's largest archives, containing over 11 million historical government and public records. From Domesday Book to modern government files. Includes paper records, digital records, websites, photographs, posters, maps, drawings and paintings.
A digital repository of government records declassified under the Canadian Access to Information Act. Spans from 1945 through 1991.
A gateway to over 800 archival repositories across Canada.
The Information Commissioner's Office (ICO) upholds information rights in the public interest, promoting openness by public bodies and data privacy for individuals. ICO is an executive non-departmental public body, sponsored by the Department for Digital, Culture, Media and Sport.
David McKie Open Data Portals
This is a great collection of Canadian open data portals, both federal and provincial. This site also provides some other useful non-Canadian data sets.
Leaked Data Sets
An international non-profit organization that publishes leaks and classified media from governments, companies and organizations alike. All data is provided by anonymous sources. #FreeAssange.
Publishes documents that are prohibited by governments worldwide. Particularly material on freedom of expression, privacy, cryptology, dual-use technologies, national security, intelligence, secret governance, open, secret and/or classified documents.
ICIJ Luxembourg Leaks
Also known as the "LuxLeaks", is a collection of over 350 documents about Luxembourg's tax rulings set up by PricewaterhouseCoopers from 2002 to 2010 for the benefits of its clients.
Distributed Denial of Secrets
A journalist 501(c)(3) non-profit devoted to enabling the free transmission of data in the public interest. Aims to avoid political, corporate or personal leanings, to act as a beacon of available information.
An organized and importable .html bookmark file that includes everything listed on this page.
A PDF copy that contains everything on this page for offline use. Updated - 4/12/2021.