OH SHINT! It's A Blog!
  • ⚓OH SHINT! Welcome Aboard
  • 📞Contact
  • 📡Received Shoutz
  • OSINT
    • OSINT? WTF??
    • OSINT Protips
    • All Things OSINT & OnlyFans
  • OSINT - Web Resources
    • Introduction to OSINT Web Resources
    • Search Engines
    • Social Media Intelligence [SOCMINT]
    • Mapping and Geospatial Intelligence [GEOINT]
    • Imagery Intelligence [IMINT]
    • Orbital Intelligence [ORBINT]
    • Business Research and Trade Intelligence [TRADINT]
    • Financial Intelligence [FININT]
    • Vehicle and Transportation Intelligence [VATINT]
    • Digital Network Intelligence [DNINT]
    • Signals Intelligence [SIGINT]
    • Deep Webs and Darknets
    • People Investigations
    • Email Addresses
    • Phone Numbers
    • Usernames
    • Gaming
    • Real Estate
    • Data Sets
    • Organized Crime and Illicit Trade
    • Stolen Property
    • War, Crisis and Conflicts
    • Weapons and Equipment Identification
    • Government Information
    • Dictionaries, Translation and Decoding
    • Miscellaneous
  • Surveillance
    • Surveillance Protips
Powered by GitBook
On this page
  • General Data Sets
  • Government Data Sets
  • Leaked Data Sets

Was this helpful?

  1. OSINT - Web Resources

Data Sets

A collection of open data sets for all sorts of interesting things.

PreviousReal EstateNextOrganized Crime and Illicit Trade

Last updated 2 years ago

Was this helpful?

General Data Sets

  • Massive digital archive of documents created by industries which influence public health, such as the tobacco, chemical, drug and fossil fuel industries.

  • A searchable database that collects and analyzes legal complaints and requests for removal of online materials (DMCA take-downs), helping Internet users to know their rights and understand the law.

  • Allows you to search 39 different scientific databases such as Pubmed, SRA, OMIN, MedGen and more from a single page.

  • Large archive of sociological surveys conducted in African countries over the last ~20 years.

  • Large archive of sociological surveys conducted in the Arab countries of Africa and the Middle East from 2007 to 2021.

  • A censorship measurement platform that collects data using multiple remote measurement techniques in more than 200 countries. Provides reports and offers their raw data sets which are available for download.

  • Massive dataset of over 600 million domains. Total size is ~16 GB.

  • A large collection of face datasets for training facial recognition systems and other things of that nature.

  • Offers over 50,000 public datasets for all kinds of various things.

  • An open repository of web crawl data that can be accessed and analyzed by anyone.

  • A massive collection of public data sources compiled by OCCRP researchers that are the most useful for investigative reporting.

  • CORE currently contains 207,255,818 searchable open access articles and research papers collected from 10,286 data providers around the world, which you can search using keywords.

  • An international, collaborative research project aimed at aggregating the collective work of independent researchers around the globe who wish to defend the public’s right to access information.

  • A large list of investigative resources from the GIJN 2021 conference. There is some very useful content in here.

  • A search engine for all kinds of data sets provided by Google.

  • Search for attorneys and related articles.

  • Offers models based on transformers for PyTorch and TensorFlow 2.0. There are thousands of pre-trained models to perform tasks such as text classification, extraction, question answering, and more.

  • An archive of publicly available and attributed data from known online information operations. The archive currently consists of over 10 million messages from Russian and Iranian state-sponsored influence operations on Twitter and Reddit, and will be updated on an ongoing basis.

  • A collection of searchable data dumps. Includes law enforcement, government, extremist, conventional, commercial, and exploit databases.

  • A collection of AI deployment harms or near harms across all disciplines, geographies, and use cases.

  • Teacher grant search database.

  • An online reference library offering hundreds of free reports packed with data, insights, and trends.

  • Aims to be the largest independent repository of open data on the African continent.

Government Data Sets

Leaked Data Sets

The US FOIA library. Contains over 6,700 scanned FOIA documents.

Search and view documents released through the FOIA and other CIA release programs.

Search through 221,373 documents reviewed and released to the public.

An independent agency of the United States government charged with the preservation and documentation of government and historical records. It is also tasked with increasing public access to those documents which make up the National Archive.

The Library of Congress is the research library that officially serves the United States Congress and is the national library of the United States. It is the oldest federal cultural institution in the U.S.

One of the world's largest archives, containing over 11 million historical government and public records. From Domesday Book to modern government files. Includes paper records, digital records, websites, photographs, posters, maps, drawings and paintings.

A digital repository of government records declassified under the Canadian Access to Information Act. Spans from 1945 through 1991.

A gateway to over 800 archival repositories across Canada.

A tool to search the national archives of Australia. Includes an advanced search function.

The Information Commissioner's Office (ICO) upholds information rights in the public interest, promoting openness by public bodies and data privacy for individuals. ICO is an executive non-departmental public body, sponsored by the Department for Digital, Culture, Media and Sport.

This is a great collection of Canadian open data portals, both federal and provincial. This site also provides some other useful non-Canadian data sets.

A collection of various datasets for Scotland.

An online directory and portal to those Tax Assessors', Treasurers’ and Recorders' offices that have developed websites for the retrieval of available public records for the U.S.

A large database and search tool for locating U.S. public records. Find everything from parking tickets to property records.

Covers state, county and city traffic cameras as well as accessing and filing traffic accident reports. Many state, county and city police departments provide forms online for filing accident reports and some even provide online searchable accident databases.

Assists thousands of users to file, track, and share public and FOIA record requests at the state, local, and federal levels within the United States, as well as producing original reporting on government transparency.

An international non-profit organization that publishes leaks and classified media from governments, companies and organizations alike. All data is provided by anonymous sources. #FreeAssange.

Publishes documents that are prohibited by governments worldwide. Particularly material on freedom of expression, privacy, cryptology, dual-use technologies, national security, intelligence, secret governance, open, secret and/or classified documents.

Data from more than 785,000 offshore companies, foundations and trusts from the , , , and the .

Also known as the "LuxLeaks", is a collection of over 350 documents about Luxembourg's tax rulings set up by PricewaterhouseCoopers from 2002 to 2010 for the benefits of its clients.

A journalist 501(c)(3) non-profit devoted to enabling the free transmission of data in the public interest. Aims to avoid political, corporate or personal leanings, to act as a beacon of available information.

An open source encyclopedia of deep politics.

UCSF Industry Documents
LuminDatabase
National Center for Biotechnology Information
Afrobarometer
Arabbarometer
CensoredPlanet
DomainsProject
Face Recognition Datasets
Kaggle
Common Crawl
OCCRP Catalogue of Research Databases
CORE Research Paper Database
Public Intelligence
GIJC21 Resources
Google Dataset Search
MartinDale
HuggingFace
Information Operations Archive
Bloopbase Searchable Data Dumps
AI Incident Database
Grants4Teachers
DataReportal
openAFRICA
FBI Vault
CIA Reading Room
US Department of State Records
US National Archives
US Library of Congress
UK National Archives
Canada Declassified
Archives Canada
Australian National Archives Search
ICO Search
David McKie Open Data Portals
SpatialHub Scotland Datasets
Netronline Public Records
BlackBookOnline
U.S. Traffic Cameras and Reports
MuckRock
WikiLeaks
CryptoMe
ICIJ Offshore Leaks
Panama Papers
Offshore Leaks
Bahamas Leaks
Paradise Papers
ICIJ Luxembourg Leaks
Distributed Denial of Secrets
WikiSpooks
49KB
Data_Set_Bookmarks_24-2-2022_ohshint.html
An organized and importable .html bookmark file that includes everything listed on this page.
36KB
Data_Set_Resources_ohshint.pdf
pdf
A PDF copy that contains everything on this page for offline use. Updated - 24/9/2022.