A wealth of data is released every week in the United States by organizations of all kinds, from federal, state and local agencies to companies, educational institutions and other nonprofits.
Many large federal datasets contain highly granular statistics that can serve as a launching point for local stories — for example, the location of alternative-fuel stations (graphic at left), campus crimes or payments to doctors by pharmaceutical and medical device companies.
As part of our periodically updated dataset digest series of posts, here we highlight relatively new/recently updated datasets and databases, including those curated by capacity-building institutions such as Investigative Reporters & Editors (IRE), ProPublica and the Sunlight Foundation.
If you know of other datasets that could be useful, please email or let us know on Twitter.
APRIL 2015 EDITION
Comparing hospital outcomes: Medicare.gov offers “Hospital Compare” data, last updated in December 2014. You can find surveys of patients’ experiences, death rates, payment information and more. Just enter a zip code in their database: http://www.medicare.gov/hospitalcompare/search.html. Or download the datasets: https://data.medicare.gov/data/hospital-compare.
Storm injuries and damages: The National Climatic Data Center provides statistics on injuries and damages relating to all storms recorded between 1950 and the present: http://catalog.data.gov/dataset/ncdc-storm-events-database. Bulk download of all data is also available: http://www.ncdc.noaa.gov/stormevents/ftp.jsp.
Small business loans: Investigative Reporters & Editors (IRE) through its NICAR database has new data on small business loans backed by the government (1990-2014). Find names and addresses, lenders, loan amounts, loan status and more: https://www.ire.org/blog/nicar/2015/03/20/updated-small-business-loans-data-now-available ($50 for members; $150 for non-members).
Campus crime reports: The NICAR database also offers the latest college campus crime data (2013), cleaned up and usefully consolidated. IRE also offers tips on how to cover your local institutions of higher education: https://www.ire.org/blog/nicar/2015/01/06/new-campus-crime-reports-2013-available-nicar-data ($25 fee for members; $75 for non-members).
Doctors prescribing drugs: ProPublica has cleaned-up Medicare Part D prescriber/doctor data (2012), including “providers’ names, addresses, specialties and contact information, as well as additional information on doctors’ prescribing habits.” Download: https://projects.propublica.org/data-store/sets/health-prescribers-2 ($200 for journalists). Or use their app to search: http://projects.propublica.org/checkup.
Payments to doctors by companies: ProPublica also offers Medicare/Medicaid open payments data to doctors by pharmaceutical and medical device companies. Download: https://projects.propublica.org/data-store/sets/openpayments-drugdevice-1 ($200 for journalists). Or use their app to search: http://projects.propublica.org/open-payments.
Harvesting all criminal justice datasets: The Sunlight Foundation is amassing a huge inventory of all criminal-justice datasets from the federal government and the states (26 so far). Search, download or contribute: http://sunlightfoundation.com/criminaljustice.
Alternative fuel stations in the United States: The Energy Department has data on all alternative fuel stations (everything from biodiesel to electric): http://catalog.data.gov/dataset/alternative-fueling-station-locations-b550c. Also see an interactive map: http://www.afdc.energy.gov/locator/stations.
Bus and large truck crashes: The Transportation Department has updated crash data involving large vehicles on U.S. roads. While the raw data isn’t available, there is a database with useful filters: https://ai.fmcsa.dot.gov/CrashStatistics/rptSummary.aspx.
Environmental health hazards: The CDC offers data from the National Environmental Public Health Tracking Network, a “system of integrated health, exposure, and hazard information and data from a variety of national, state, and city sources”: http://ephtracking.cdc.gov.
Nursing home problems: The Centers for Medicare and Medicaid Services has information on nursing homes with serious quality issues and their status: http://www.cms.gov/Medicare/Provider-Enrollment-and-Certification/CertificationandComplianc/Downloads/SFFList.pdf.
Beer production: The Alcohol and Tobacco Tax and Trade Bureau, part of the U.S. Treasury Department, provides monthly updates on brewery production: http://www.ttb.gov/beer/beer-stats.shtml.
Defense Department dataset list: The Sunlight Foundation has long been doggedly pursuing a comprehensive list of government datasets through FOIA. One of the interesting recent disclosures from Sunlight’s request was a Defense Department list of datasets: http://www.defense.gov/data.json. But as Sunlight notes: “The Department of Defense, somehow, has not cataloged within its index any ‘non-public’ or ‘restricted’ data, nor does it appear to have redacted any information under FOIA. Hopefully this reflects a choice to focus on publishable data, but, perhaps obviously, Defense is an agency we expect to have a lot of nonpublic information — information that still very much needs to be indexed and tracked.”
OCTOBER 2014 EDITION
Labor violations enforcement in your area: A regularly updated database from the U.S. Department of Labor on all manner of violations, from wages to safety. Search by state, ZIP code, company name and more. Export findings in a CSV file. Updated October 2014: http://enforcedata.dol.gov/views/search.php.
Complaints against banks in your state: The Consumer Financial Protection Bureau tracks complaints about banks, financial products and services, and makes the information available on Data.gov. Drill down and sort by state, date and lender to reveal which institutions are subject to the most public complaints. Exportable as an XML file. Updated March 2014: http://catalog.data.gov/dataset/consumer-complaint-database#topic=developers_navigation.
Public schools’ disciplinary practices: Schools’ use of restraints and seclusion on students, from ProPublica, which also provides some handy “reporting recipes” for pulling out stories. Slice the data by state, district and school to localize. Updated June 2014: https://projects.propublica.org/data-store/sets/education-restraint-and-seclusions.
Youth voting turnout in midterm elections: Historic patterns of youth voting, highlighting the possible impact of young voters during the 2014 midterm elections, from Tufts University’s Center for Information and Research on Civic Learning and Engagement. Data on minorities and competitive states are broken out. Updated October 2014: http://www.civicyouth.org/2014-midterms-young-voters-in-competitive-senate-races.
Farms in your area: All of America’s farms by type and size, with local economic and demographic data available. The USDA breaks out the data from its 2012 agricultural census. Datasets are in PDFs and were last updated in September 2014: http://catalog.data.gov/dataset/2012-census-of-agriculture.
Firearms dealers and manufacturers: Dataset of federal firearms licensees maintained by the Bureau of Alcohol, Tobacco, Firearms and Explosives. State and local-level data are available, as are licensee type. Updated September 2014: http://www.atf.gov/content/firearms/firearms-industry/listing-FFLs.
Hazardous-material incidents: Latest data on reported hazmat incidents, in a searchable database maintained by the U.S. Department of Transportation. Enter location-based and incident-type restrictions and export as a CSV file. Updated October 2014: https://hazmatonline.phmsa.dot.gov/IncidentReportsSearch.
Toxic chemicals in your community: The Environmental Protection Agency’s Toxics Release Inventory (TRI) Program provides a range of data sources and tools for a better understanding of how your community might be affected by dangerous substances. Data is currently updated through 2013, and reports can be downloaded in XLS, CSV, PDF and RTF formats: http://www2.epa.gov/toxics-release-inventory-tri-program/tri-data-and-tools.
Comparing your city’s spending to other cities: A “fiscally standardized cities” database, created and maintained by the Lincoln Institute of Land Policy, allows you to do granular, apples-to-apples comparisons among financial characteristics of 112 U.S. cities — revenues, direct expenditures, capital outlays and more. You can create tables and export to CSV files based on your analyses. Latest datasets are based on 2011, but the advantage is that you can look at statistical trends going back more than a decade: http://www.lincolninst.edu/subcenters/fiscally-standardized-cities.
County-level business patterns: Examine trends of economic activity at the county level through datasets maintained by the U.S. Census Bureau. Find data on types of businesses, employees and payroll. (Fair warning: You’ll need to brush up on how the government codes business types and geography.) Updated through 2012: http://catalog.data.gov/dataset/county-business-patterns/resource/2bfcc388-170b-4c60-8b41-addba3bef1d4.
Keywords: data journalism, local reporting, dataset digest, big data