Research Datasets: Difference between revisions

From Irregularpedia
Jump to navigation Jump to search
Initial
 
Tag: wikieditor
 
(7 intermediate revisions by the same user not shown)
Line 2: Line 2:
= Datasets =
= Datasets =


'' [[#Finding%20Datasets|Finding Datasets]] - [[#Advanced%20Google%20Query%20for%20Datasets|Advanced Google Query for Datasets]]
''[[#Finding%20Datasets|Finding Datasets]] - [[#Advanced%20Google%20Query%20for%20Datasets|Advanced Google Query for Datasets]]
'' [[#Known%20Datasets|Known Datasets]]
''[[#Known%20Datasets|Known Datasets]]


Datasets to use for Research (See [[research|Research wiki]])
Datasets to use for Research (See [[research|Research wiki]])
Line 12: Line 12:
== Finding Datasets ==
== Finding Datasets ==


'' [https://datasetsearch.research.google.com/ Google Data Set Search]
* [https://datasetsearch.research.google.com/ Google Data Set Search]
'' [https://scholar.google.com/schhp?hl=en Google Scholar]Google Search Power for Academic Writing<br />
* [https://scholar.google.com/schhp?hl=en Google Scholar] Google Search Power for Academic Writing
 
* [https://www.jstor.org/ JSTOR] digital library of academic journals, books, and primary sources
'' [https://www.jstor.org/ JSTOR] digital library of academic journals, books, and primary sources
* [https://www.researchgate.net/ Research Gate] Massive Database of Academic Journals
'' [https://www.researchgate.net/ Research Gate] Massive Database of Academic Journals
* [https://www.google.com/search?q=site%3A.edu+%22free%22+%28%22research%22+or+%22dataset%22 Google Dork] for Academic Research Resources
'' [https://www.google.com/search?q=site%3A''.edu+%22free%22+%28%22research%22+or+%22dataset%22 Google Dork] for Academic Research Resources
* [https://elicit.org Elicit] AI journal search.
'' [https://elicit.org Elicit] AI journal search.
* [https://osf.io/ Open Science Framework (OSF)] is a free and open-source platform designed to support research and collaboration across the research life cycle.
'' [https://osf.io/ Open Science Framework (OSF)] is a free and open-source platform designed to support research and collaboration across the research life cycle.
* [https://github.com/awesomedata/awesome-public-datasets Awesome Public Datasets] A curated list of over 40,000 public datasets across various topics.
'' [https://github.com/awesomedata/awesome-public-datasets Awesome Public Datasets] A curated list of over 40,000 public datasets across various topics.
* [https://github.com/hslatman/awesome-threat-intelligence Awesome Cyber Threat Datasets] provide context, mechanisms, indicators, implications, and actionable advice about an existing or emerging menace or hazard to assets that can inform decisions regarding the subject’s response to that menace or hazard.
'' [https://github.com/hslatman/awesome-threat-intelligence Awesome Cyber Threat Datasets] provide context, mechanisms, indicators, implications, and actionable advice about an existing or emerging menace or hazard to assets that can inform decisions regarding the subject’s response to that menace or hazard_. 
* [https://www.reddit.com/r/datasets/ r/datasets]
'' [https://www.reddit.com/r/datasets/ r/datasets]
* [https://www.reddit.com/r/Dissertation/ r/Dissertation]
'' [https://www.reddit.com/r/Dissertation/ r/Dissertation]
* [https://www.reddit.com/r/AskAcademia/ r/AskAcademia]
'' [https://www.reddit.com/r/AskAcademia/ r/AskAcademia]
* [https://www.reddit.com/r/GradSchool/ r/GradSchool]
'' [https://www.reddit.com/r/GradSchool/ r/GradSchool]


<span id="queries-for-datasets"></span>
<span id="queries-for-datasets"></span>
=== Queries for Datasets ===
=== Queries for Datasets ===


<pre class="copy-search">&quot;Search_TERM_HERE&quot; site:vision.in.tum.de OR site:www.cdbb.cam.ac.uk OR site:bimportal.scottishfuturestrust.org.uk OR site:digicatapult.org.uk OR site:pewresearch.org OR site:odsc.com OR site:archive.ics.uci.edu OR site:research.tudelft.nl OR site:archive.data.jhu.edu OR site:systems.jhu.edu</pre>
<pre class="copy-search">"Search_TERM_HERE" site:vision.in.tum.de OR site:www.cdbb.cam.ac.uk OR site:bimportal.scottishfuturestrust.org.uk OR site:digicatapult.org.uk OR site:pewresearch.org OR site:odsc.com OR site:archive.ics.uci.edu OR site:research.tudelft.nl OR site:archive.data.jhu.edu OR site:systems.jhu.edu</pre>
 
<span id="case-studies-and-projects-using-datasets"></span>
<span id="case-studies-and-projects-using-datasets"></span>
=== Case Studies and Projects Using Datasets ===
=== Case Studies and Projects Using Datasets ===


'' [https://github.com/vizdata-f21/project-2-tidy_team?tab=readme-ov-file 2 Tidy] interactive spatio-temporal visualization of worldwide deaths related to various risk factors, specifically air pollution, substance use, and lack of sanitation. ## Known Datasets
* [https://github.com/vizdata-f21/project-2-tidy_team?tab=readme-ov-file Global Deaths by Risk Factors Github Repo] interactive spatio-temporal visualization of worldwide deaths related to various risk factors, specifically air pollution, substance use, and lack of sanitation.


{| class="wikitable"
== Known Datasets ==
 
{| class="wikitable sortable"
|-
|-
! style="text-align: left;"| URL
! style="text-align: left;"| URL
! Comments 
! Comments
! style="text-align: right;"| Free (Y/N)
! style="text-align: right;"| Free (Y/N)
! Category
! Category
Line 45: Line 47:
|-
|-
| style="text-align: left;"| [https://library.si.edu/research/free-databases-and-collections Smithsonian Library Resources]
| style="text-align: left;"| [https://library.si.edu/research/free-databases-and-collections Smithsonian Library Resources]
| This list includes databases, collections and search tools, selected by Smithsonian Libraries staff, that are freely available via the Internet. 
| This list includes databases, collections and search tools, selected by Smithsonian Libraries staff, that are freely available via the Internet.
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Academic
| Academic
| Global
| Global
|-
|-
| style="text-align: left;"| [http://cross-sub.org/ CrossSub]<br> [https://yz-data.shinyapps.io/xsub/ Alt Link]
| style="text-align: left;"| [http://cross-sub.org/ CrossSub]<br>[https://yz-data.shinyapps.io/xsub/ Alt Link]
| micro-level, subnational event data on armed conflict and contention around the world 
| micro-level, subnational event data on armed conflict and contention around the world
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Conflict
| Conflict
Line 80: Line 82:
| LATAM - Mexico
| LATAM - Mexico
|-
|-
| style="text-align: left;"| [https://www.google.com/search?q=inurl%3Ahttps%3A%2F%2Fsimplemaps.com%2Fdata%2F''-cities+%22COUNTRY-HERE%22&newwindow=1&client=firefox-b-1-d&sxsrf=ALiCzsaBIS8xQeZg9SWV58kErpaH3B1Ygg%3A1651200770193&ei=AlNrYoS2C5-LytMPiYONqAw&ved=0ahUKEwiEv_Caorj3AhWfhXIEHYlBA8UQ4dUDCA0&uact=5&oq=inurl%3Ahttps%3A%2F%2Fsimplemaps.com%2Fdata%2F*-cities+%22COUNTRY-HERE%22&gs_lcp=Cgdnd3Mtd2l6EAM6BwgAEEcQsANKBAhBGABKBAhGGABQylJYh5MBYJqaAWgCcAF4AIABdogBzQuSAQQ0LjEwmAEAoAEByAEIwAEB&sclient=gws-wiz World City Database]
| style="text-align: left;"| [https://www.google.com/search?q=inurl%3Ahttps%3A%2F%2Fsimplemaps.com%2Fdata%2F' World City Database]
| Database of cities with information of population and general Lat Long 
| Database of cities with information of population and general Lat Long
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Country Data
| Country Data
Line 119: Line 121:
| Foreign Direct Investment (FDI) Statistics
| Foreign Direct Investment (FDI) Statistics
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Finance &amp; Business
| Finance & Business
| Global
| Global
|-
|-
| style="text-align: left;"| [https://data.worldbank.org/ World Bank Data]
| style="text-align: left;"| [https://data.worldbank.org/ World Bank Data]
| Economica Datasets
| Economic Datasets
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Finance &amp; Business
| Finance & Business
| Global
| Global
|-
|-
| style="text-align: left;"| [https://bimportal.scottishfuturestrust.org.uk/page/roi-calculator Scottish Futures Trust ROI Calculator] 
| style="text-align: left;"| [https://bimportal.scottishfuturestrust.org.uk/page/roi-calculator Scottish Futures Trust ROI Calculator]
| Calculator that allows the user to calculate the expected return on investment of a building project
| Calculator that allows the user to calculate the expected return on investment of a building project
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Finance &amp; Business
| Finance & Business
|
|
|-
|-
Line 137: Line 139:
| cost of living calculator and comparison tool. Useful for determining the average price around the world.
| cost of living calculator and comparison tool. Useful for determining the average price around the world.
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Finance &amp; Business
| Finance & Business
| Global
| Global
|-
|-
| style="text-align: left;"| [https://ai.reportlinker.com/pricing Reportlinker]
| style="text-align: left;"| [https://ai.reportlinker.com/pricing Reportlinker]
| AI enabled Market Intelligence Platform
| AI enabled Market Intelligence Platform
| style="text-align: right;"| N
| Finance &amp; Business
|
|-
| style="text-align: left;"| [https://archive.ics.uci.edu/ml/datasets/BitcoinHeistRansomwareAddressDataset BitcoinHeist Ransomware Address Dataset]
| Contains addresses labeled as belonging to one of the four categories: White, Gray, Black, or Unknown.
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Finance &amp; Business
| Finance & Business
|
| Global
|-
|-
| style="text-align: left;"| [https://robotcar-dataset.robots.ox.ac.uk Oxford Robot Car Dataset]
| style="text-align: left;"| [https://www.kaggle.com/datasets Kaggle]
| Dataset for autonomous driving research
| Data repository with many datasets for competitions
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| General
| Machine Learning
|
| Global
|-
|-
| style="text-align: left;"| [https://www.cdbb.cam.ac.uk/research/data-science-artificial-intelligence-machine-learning CDBB Data Science and AI Research]
| style="text-align: left;"| [https://www.youtube.com/ YouTube]
| Research on data science, AI and machine learning, includes datasets
| Video Searchable Database of Machine Learning Videos
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| General
| Machine Learning
|
| Global
|-
| style="text-align: left;"| [https://www.digicatapult.org.uk/services/innovation-and-acceleration/ Digital Catapult Innovation and Acceleration]
| Helps businesses bring new products and services to market
| style="text-align: right;"| Y
| General
|
|-
| style="text-align: left;"| [https://odsc.com Data Science Conference (ODSC)]
| Conference that brings together the data science community, including datasets and other resources
| style="text-align: right;"| N
| General
|
|-
|-
| style="text-align: left;"| [https://partyfacts.herokuapp.com/data/ Party Facts Datasets]
| style="text-align: left;"| [https://www.amazon.com/AWS/Amazon-S3 Amazon S3]
| The Party Facts project is a gateway to empirical data about political parties and a modern online platform about parties and their history as recorded in social science datasets. It uses social media technologies to create a collaborative data infrastructure following an approach to collect data successfully applied by the [https://eol.org/ Encyclopedia of Life] (EOL).
| Various datasets provided by Amazon Web Services
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Politics
| Data Storage
| Global
| Global
|-
|-
| style="text-align: left;"| [https://www.chesdata.eu/chesla Chapel Hill Expert Survey for Latin America]
| style="text-align: left;"| [https://r-nd.ami.btu.de/ AMI Data Set]
| Administered in 2020 and completed by 160 experts specializing in political parties, the 2020 CHES LA dataset provides information about the positioning of 112 political parties and presidents on political ideology, policy positions, party characteristics, and party linkages. The survey covers political parties and presidents in 12 Latin American countries. The
| This dataset comprises data from a wide range of sources including the finance sector.
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Politics
| Data Storage
| LATAM
|-
| style="text-align: left;"| [https://www.interpol.int/en/How-we-work/Databases Interpol Datasets]
| Police need up-to-date global data on criminals in order to carry out successful international investigations.
| style="text-align: right;"| N
| Politics &amp; Law
| Global
| Global
|-
|-
| style="text-align: left;"| [https://github.com/riceissa/global-sanctions-data Global Sanctions Dataset]
| style="text-align: left;"| [https://bostondata.org/ Boston Data]
| A compilation of international sanctions against countries and entities. 
| Boston city datasets
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Politics &amp; Law
| City Data
| Global
| Boston
|-
|-
| style="text-align: left;"| [https://statistics.cepal.org/portal/cepalstat/dashboard.html?theme=1&lang=en Demographic and Social Dataset]
| style="text-align: left;"| [https://www.socialeurope.eu/ Social Europe]
|
| Social Data
| style="text-align: right;"|
| Populations &amp; People
| Global
|-
| style="text-align: left;"| '''[https://api.gdeltproject.org/api/v2/summary/summary/ GDelta]'''
| monitors the world’s '''TV broadcast, print, and web''' news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Populations &amp; People
| Social
| Global
| Global
|-
|-
| style="text-align: left;"| [https://www.fec.gov/data/ FEC]
| style="text-align: left;"| [https://www.census.gov/ US Census]
| US Voting Data
| Census Data
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Populations &amp; People
| Demographics
| USA
| USA
|-
|-
| style="text-align: left;"| [https://github.com/woosal1337/cia/tree/main/datasets Github CIAWorldFactbook] 
| style="text-align: left;"| [https://data.gov/ US Government Data]
| CIA World Fact Book datasets
| Government Data
| style="text-align: right;"| Y
| Populations &amp; People
|
|-
| style="text-align: left;"| [https://www.statista.com/ Statista]
| Insights and facts across 170 industries and 150+ countries
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Populations &amp; People
| Government
|
|-
| style="text-align: left;"| [https://esoc.princeton.edu/data Princeton ESOC] 
| event, public opinion, and spatial data
| style="text-align: right;"| Y
| Populations &amp; People
|
|-
| style="text-align: left;"| [https://www.heavy.ai/demos/ships Heavy.ai:ships]
| shipping 
| style="text-align: right;"| Y
| Populations &amp; People
|
|-
| style="text-align: left;"| [https://www.latinobarometro.org/latContents.jsp Latino Baromitor]
|
| style="text-align: right;"|
| Populations &amp; People
| LATAM
|-
| style="text-align: left;"| [https://www.inegi.org.mx/ Mexican Socio-Economic Dataset]
|
| style="text-align: right;"| Y
| Populations &amp; People
| LATAM - Mexico
|-
| style="text-align: left;"| [https://app.ignitionrobotics.org/dashboard IgnitionRobotics] 
| Google Project to 3d model Objects 
| style="text-align: right;"| Y
| Scans
|
|-
| style="text-align: left;"| [https://systems.jhu.edu/research/space/ JHU Space Systems]
| Satellite datasets and tools for space systems research
| style="text-align: right;"| Y
| Science
|
|-
| style="text-align: left;"| [https://scl2-04-gpu03.mapd.com/ OmniSci Tweet Map]
| Tweet Data Map
| style="text-align: right;"| Y
| Social Media
|
|-
| style="text-align: left;"| [https://securingdemocracy.gmfus.org/hamilton-dashboard Hamilton Dashboard]
| Massive Amounts of Tweet Data especially covering Russia, China, and Iran
| style="text-align: right;"| Y
| Social Media
| Russia, China, and Iran
|-
| style="text-align: left;"| [https://www.pewresearch.org/download-datasets/ Pew Research Center Datasets] 
| Datasets on various social and political topics.
| style="text-align: right;"| Y
| Social Science
| USA
| USA
|-
|-
| style="text-align: left;"| [https://gitnux.org/ GitNux]
| style="text-align: left;"| [https://www.data.gov.uk/ UK Government Data]
|
| Government Data
| style="text-align: right;"|
| Various
| Global
|-
| style="text-align: left;"| [https://ourworldindata.org '''Our World In Data''']<br><br>[https://github.com/owid/owid-datasets/tree/master/datasets '''OWID GitHub Repo''']
| Research and data to make progress against the world’s largest problems.
| style="text-align: right;"| Y
| style="text-align: right;"| Y
| Various
| Government
| Global
| UK
|-
| style="text-align: left;"| [https://www.e-stat.go.jp/en/stat-search/?page=1 Japanese Datasets]
| Japanese e-Stat
| style="text-align: right;"| Y
| Various
| Japan
|-
| style="text-align: left;"| [https://archive.data.jhu.edu/ JHU Data Archive] 
| Datasets on social, health, and economic topics
| style="text-align: right;"| Y
| Various
|
|-
| style="text-align: left;"| [https://data.fivethirtyeight.com/ Datasets from FiveThirtyEight]
| Datasets used in FiveThirtyEight articles and graphics.
| style="text-align: right;"| Y
| Various
| USA
|}
|}


Convert to excel or csv if needed : https://tableconvert.com/markdown-to-excel
= Research Data Datasets =
 
[[Category:Research]]

Latest revision as of 00:57, 12 September 2024

Datasets

Finding Datasets - Advanced Google Query for Datasets Known Datasets

Datasets to use for Research (See Research wiki)

Find additional datasets behind a login in the community dataset section

Finding Datasets

Queries for Datasets

Case Studies and Projects Using Datasets

Known Datasets

URL Comments Free (Y/N) Category Region
Smithsonian Library Resources This list includes databases, collections and search tools, selected by Smithsonian Libraries staff, that are freely available via the Internet. Y Academic Global
CrossSub
Alt Link
micro-level, subnational event data on armed conflict and contention around the world Y Conflict Global
ACLE real-time data on the locations, dates, actors, fatalities, and types of all reported political violence and protest events around the world. N Conflict Global
OSMP Open Source Munitions Portal (OSMP) A new open-source portal was just launched today by Airwars and Arms Research. Incredibly useful database, particularly for anyone covering armed conflicts or wars. Y Conflict N/A
LiveUA factual reporting of a variety of important topics including conflicts, human rights issues, protests, terrorism, weapons deployment, health matters, natural disasters, and weather related stories, among others, from a vast array of sources Y Conflict Ukraine
Venezuelan Violence Data Y Conflict LATAM - Mexico
World City Database Database of cities with information of population and general Lat Long Y Country Data Global
TradingEconomics Mass database of metrics and indicators by country over time Y Country Data Global
GitNux Crime Reports Crime reports and stats Crime Global
Cloudflare Radar A view of outages, threats, rankings and more based on the massive amount of cloudflare data Y Cyber Global
TUM Data Large collection of data sets for computer vision research Y Cyber
CEPAL Cyber Attacks Cyber Attacks in LATAM Y Cyber LATAM
OECD Foreign Direct Investment (FDI) Statistics Y Finance & Business Global
World Bank Data Economic Datasets Y Finance & Business Global
Scottish Futures Trust ROI Calculator Calculator that allows the user to calculate the expected return on investment of a building project Y Finance & Business
Numbeo cost of living calculator and comparison tool. Useful for determining the average price around the world. Y Finance & Business Global
Reportlinker AI enabled Market Intelligence Platform Y Finance & Business Global
Kaggle Data repository with many datasets for competitions Y Machine Learning Global
YouTube Video Searchable Database of Machine Learning Videos Y Machine Learning Global
Amazon S3 Various datasets provided by Amazon Web Services Y Data Storage Global
AMI Data Set This dataset comprises data from a wide range of sources including the finance sector. Y Data Storage Global
Boston Data Boston city datasets Y City Data Boston
Social Europe Social Data Y Social Global
US Census Census Data Y Demographics USA
US Government Data Government Data Y Government USA
UK Government Data Government Data Y Government UK

Research Data Datasets