Expert Commentary

Data journalism syllabus: From numeracy to visualization and beyond

This syllabus offers faculty a guide for teaching basic data journalism skills, including statistical and visualization techniques, over a 13-week semester.

(Pixabay)

The skills required to be a successful data journalist are many, ranging from numeracy and spreadsheet fluency to being able to create visualizations and interpret and perform statistical analyses. In most moderate to large newsrooms, some data tasks are divided among desks and departments, with reporters, editors, designers and coders working in teams. Still, it is important for all team members to have some familiarity with what the others are doing. And the core skills of working with numbers and telling stories in the public interest are fundamental to all newsroom work.

This syllabus covers these core skills while also giving students some familiarity with relevant software, statistical and visualization techniques and programming. Over all, issues of data ethics and valid interpretation are front and center here. This syllabus is informed by the idea that data journalism is practiced in its highest form not when it is just involved in creating dazzling graphics, but when its methods are used to investigate wrongdoing, hold the powerful accountable and spotlight public policy failings.

Computer-assisted reporting, or CAR, has been around for decades. While this area of journalism has long been considered an important subfield in newsrooms and journalism schools, societal and industry changes now demand that the basic skills needed to work with data become, in effect, ubiquitous and mainstream among reporters, editors and instructors. A structural shift in how information is being produced, used, and, at times, misused, dictates a shift in how journalists prepare for the profession. Nearly all of the powerful institutions in society – from government agencies to businesses, sports franchises to insurance firms – are heavily invested in collecting and leveraging data. There is simply no way journalists can perform their watchdog functions if they do not have baseline skills and knowledge to interrogate the activities of these agencies.

Background resources and context:

For journalism schools and faculty looking for wider recommendations and ideas about how to incorporate data-related skills and knowledge into the curriculum, the 2016 Knight Foundation-Columbia Journalism School report “Teaching Data and Computational Journalism,” by Charles Berrett and Cheryl Phillips, offers a comprehensive overview of the field, based on a survey of 113 schools. The American Press Institute has also published a series of articles describing how data journalism could be integrated into curricula.

A 2017 study based on data collected through Harvard’s Journalist’s Resource project, “Knowing the Numbers,” offers an overview of the debates in the media industry and academia over journalists’ proverbial “math phobia” as well as its consequences and what can be done about it. For those interested in the evolution of this field, see Mark Coddington’s study “Clarifying Journalism’s Quantitative Turn,” Digital Journalism, 2015.

Course objectives

Students should be able to:

  • Think critically and deeply about the limitations of datasets and evaluate the strengths and weaknesses of data.
  • Assess how institutions may be collecting and using data and the implications of these processes for the public.
  • Use and manipulate datasets with ease and comfort, being able to ask interesting questions and explore various angles.
  • Deploy basic software and applications of various kinds to analyze and visualize data in creative ways.
  • Demonstrate a solid grasp of data storytelling techniques that can help broad audiences understand data.

Course sequence design

This course will acquaint students with the basics of cleaning, analyzing and interpreting information in tabular form – rows and columns. It will challenge them to improve their understanding of numbers and quantification, as well as offer tools and frameworks for presenting data to audiences. The syllabus also covers special topics such as interpreting academic research, advanced visualization techniques and emerging fields such as artificial intelligence.

Supplemental texts:

Various articles are suggested as readings for each unit. While no single text is required for this sequence of lessons, the following list may be useful for instructors and students:

  • Jonathan Stray, The Curious Journalist’s Guide to Data, 2016.
  • Brant Houston, Computer-Assisted Reporting: A Practical Guide, 2014.
  • David Herzog, Data Literacy: A User’s Guide, 2016.
  • The Data Journalism Handbook, eds. Gray, Bonnegru, Chambers, 2012.
  • Alberto Cairo, The Functional Art: An Introduction to Information Graphics and Visualization, 2013.
  • John W. Foreman, Data Smart: Using Data Science to Transform Information Into Insight, 2014.
  • Tamara Munzner, Visualization Analysis and Design, 2014.
  • Philip Meyer, The New Precision Journalism, 1991.

Regularly review these publications:

ProPublica
The Upshot (The New York Times)
FiveThirtyEight
Vox 

Data resources:

 

Week 1: Drilling Down on Numbers
While many students have taken advanced math at some point in their academic lives, most need a refresher on basic concepts. Working with raw data in tabular form can seem like a novel task, even though the analytical tasks of arithmetic, ratios, rates and the like are not particularly complex. This week focuses on familiarizing students with basic strategies for doing data analysis and introducing some frameworks for critical thinking.

Class 1: Specifics of counting and quantification

Readings:

Activity:
Use the DataBasic.io tutorial on data in tabular form and CSV files to explore data on passengers of the Titanic. Look at visualization of data for each column in the dataset and discuss the nature of the data offered, inferences that could be made and limits of the data.

Class 2: Numeracy and the importance of critical thinking

Readings:

Activity:
Students should explore the website CensusReporter and identify towns or cities they might have an interest in covering. They should review the demographic profiles of these municipalities, note interesting patterns and compile a list of ideas for stories they might pursue using this data.

Dataset & data story of the week:
“The Deadliest Jobs in America,” Bloomberg News, May 2015
Census of Fatal Occupational Injuries, U.S. Department of Labor

 

Week 2: Data in Tabular Form: The Fundamentals

This week focuses on the core skills of data manipulation. To facilitate foundational knowledge in how to manipulate and analyze data in tabular form, instructors should assign the NICAR Coursepack, or a similar sequence of Excel, Google Sheets or other spreadsheet-oriented exercises.

As a way of framing this essential, but often pain stakingwork, students should read the interviews that Journalist’s Resource has done with two prominent data journalists — Sarah Cohen of the New York Times and Steve Doig of Arizona State University –as well as Scott Klein’s 2016 article published in Nieman Journalism Lab, “Want to Start a Small Data Journalism Team in Your Newsroom? Here are 8 Steps.”  With these expert views in mind, students should reflect on the skills they are building, the areas in which they want to build further knowledge and what they believe are keys for success in the field.

Class 1: Sorting, Summing and Percentage Change

Readings/Materials:

Activity:
Students should explore ProPublica’s “Debt by Degrees” database, which provides information on student debt issues and schools. Afterward, students should identify patterns and potential stories they think would interest news audiences in their state and region.

Class 2:

Readings/Materials:

Activity:
Using sorting and filtering techniques, students should use data collected through 311 telephone calls to practice mapping civic complaints in a city. To locate and map the data, follow the Storybench.org tutorial and import the data into Carto.com.

Dataset & data story of the week:
Andrew Ryan, et al., “City Payroll Soars after Police and Fire Deals,” The Boston Globe, 2015
City of Boston employee payroll data, 2014

 

Week 3: Challenges with Data: Finding and Cleaning
Getting clean data is rarely easy, and it should come as a relief for data journalists to know that even the most accomplished data scientists spend a substantial amount of time cleaning and transforming datasets for use. It is slow and patient work, requiring rigorous systems and work sequences to ensure data integrity at all steps of the process. Still, there are large, professionally-curated administrative datasets that are increasingly easy to use and can be accessed from statistical collection agencies at the federal and state levels of government. (See Journalist’s Resource to find all of the federal government’s administrative datasets in one place.) This week, students will look at some of the challenges associated with data requests, cleaning and analysis.

Class 1:

Readings:

Activity:
Public records requests are a key piece of data journalism. Students should review the activities of a government information-requesting project called MuckRock as well another project called FOIA Mapper. Students should make a request for data through MuckRock or directly through a government website. Careful consideration should be given to the scope of the request and the language used. Use the search tools at FOIA Mapper to review similar requests.

Class 2:

Readings:

Activity:
Students should download OpenRefine (Mac and PC versions), and try cleaning some data with it. They should also try Tabula to extract data tables from PDF files. Specifically, students might use these tools to review documents and data from local charities or nonprofit organizations. See “Investigating Nonprofits and Charities: Where to Find Internal Data, Public Records,” from Journalist’s Resource.

Dataset & data story of the week:
Ben Casselman, “Where Police Have Killed Americans In 2015,” FiveThirtyEight
Police Killings, FiveThirtyEight/data, GitHub

 

Week 4: Statistics: Basics of Inference, Correlation, Probability
The ability to manipulate numbers in a sophisticated way is increasingly important in data journalism. This week presents a variety of perspectives and empowers students with knowledge and tools to both interpret and perform some of this work.

Class 1:

Readings:

Activity:
To prepare students to be critical consumers and producers of representations of data, they should review the practices displayed in “How to Spot Visualization Lies,” by Nathan Yau (Flowing Data, 2017). In teams, students should find 5 to 10 visualizations they find online that are flawed in some way and then describe how these visualizations could be improved.

Class 2: Polling and surveys

Readings:

Activity:
Read: Harry Enten, “13 Tips For Reading General Election Polls Like A Pro,” from FiveThirtyEight. Discuss the national election results of 2016 and problems with polling. Students should blog about important lessons learned.

Dataset & data story of the week:
Gabriel Dance, Tom Meagher, “Crime in Context,” The Marshall Project, 2016
FBI’s “Crime in the United States, 2015” report

 

Week 5: Visualization Foundations
The art of data visualization has many forms and degrees of sophistication, from basic web applications to programming languages such as JavaScript’s D3 and R. Simplicity and clarity are the chief virtues of data graphics. But interactive functions, which can be complicated to create, often help audiences explore data in layers and hone in on specific facts and information that is most relevant to their own lives. This week looks at basic concepts and explores compelling recent examples in journalism.

Class 1: Visualization basics

Readings:

Activity: Break into teams and work on ways to show the relationships between two quantities through visual encoding. Refer to “45 Ways to Communicate Two Quantities,” Santiago Ortiz, 2013.

Class 2: Data visualization in journalism practice

Readings:

Activity: Make a simple line graph, charting a single variable over time. You might use this Journalist’s Resource post for guidance: Dataset Digest: From Data.gov to Chartbuilder, A Lesson with Organic Farm Data.”

Dataset & data story of the week:
Jennifer Oldham, “Exhaustion Is Her Copilot: 6 Days with a Michigan Trucker,” Bloomberg News, 2014 ; (graphic) “Trucker’s Odyssey,” Bloomberg News, 2014
Large Truck and Bus Crash Facts, U.S. Department of Transportation

 

Week 6: Advanced Visualization Techniques
This week looks at some of the research and deeper thinking related to data visualization. Some foundational studies in the field are introduced, and some noteworthy applications are explored.

Class 1:

Readings:

Activity:
Tableau is a visualization software tool that classes can use for free; Tableau Public is a web application that also is free. See data journalism examples at Tableau and a gallery of recent hits. Students should familiarize themselves with Tableau’s user interface and then produce a visualization of moderate complexity.

Class 2:

Readings:

Activity:
Students should review the following: “10 Things You Can Learn from the New York Times’ Data Visualizations,” Andy Kirk, 2012; and selections from Flowing Data’s library of infographics. Students should write a blog post discussing two or three data visualizations and explain what specific techniques make these examples stand out.

Data story of the week:
Students should review selections from the New York Times’s Graphics department.

 

Week 7: Interpreting Academic Research: Part 1
The world of academic research is part of data journalism. Several leading news sites do work in this area, including FiveThirtyEight, Vox and The Upshot. They focus heavily on new research findings. Students should familiarize themselves with academic search engines and databases such as Google Scholar, PubMed, Microsoft Academic Search and the National Bureau of Economic Research.

Class 1:

Readings:

Activity:
Students should use the Journalist’s Resource database to identify several studies for potential reporting projects. They should draw up a list of questions to ask the researchers who authored the studies selected. 

Class 2:

Readings:

Activity:
Students should locate studies on the Journalist’s Resource website that have a strong geographic dimension and then relate it to public policy issues. Review the studies to generate ideas about the kinds of data that can spark good stories. Using the mapping application Carto.com, students should map government data in a way that clearly informs the public about an important policy issue (for example: polluting factories in a state; schools that underperform, etc.).

Data story of the week:
“Poisoned Places,” NPR and Center for Public Integrity, 2014 (series here)
Toxics Release Inventory (TRI) Program, U.S. Environmental Protection Agency

 

Week 8: Interpreting Academic Research: Part 2
This week deepens students’ understanding of academic research and data analysis. The second class proposes a case study around climate change, prompting students to use research to inform their reporting on relevant local data.

Class 1:

Readings:

Activity:
Students should use the application Timeline JS to create a sequence that tells the story of the evolution of knowledge in a field of study. For example, landmark studies published in cancer research, or major works that look at poverty in U.S. cities. The timeline need not be comprehensive, but it should help an audience understand how what we know about an issue through academic research has grown and changed. Using Google Scholar, students can find the most highly-cited studies on any given topic and examine chronology and citations.

 

Class 2: Scientific literature and data: Climate change case study

Readings:

Activity:
Students should review “Writing about Think Tanks and Using Their Research: A Cautionary Tip Sheet,” from Journalist’s Resource. They should find studies from think tanks that may be biased for partisan reasons or potentially compromised by industry funding. In a blog post, they should discuss pieces of research that may be problematic for journalists and how that research might be properly cited and contextualized, if used at all.

Dataset & data story of the week:
Students should review selections from ProPublica’s data section.

 

Week 9: Special Topic 1: Health, Well-being and Medical Data
Health and medicine are tricky terrain for journalists, as new studies and data can be of utmost public importance but also promoted with hype and spin. In addition, health and medical topics are fraught with statistical perils.

To get students thinking, they might watch the well-known TED Talk by Hans Rosling, “The Best Stats You’ve Ever Seen,” about global health statistics and data problems, and review “Statistics for Journalists,” by Connie St. Louis. Students should then find an article or blog where they believe health, medical or epidemiological statistics might be used in a misleading way. They should write a short critique raising questions about the news item.

Class 1:

Readings:

Activity:
Students should use the federal Centers for Medicare & Medicaid Services’ Open Payments database to examine patterns of payments among doctors in their community. They should compile the data and create visualizations that could inform a news audience.

Class 2:

Readings:

Activity:

Explore the ProPublica database “Treatment Tracker” and the associated story. Students should look at the “Local Stories” column on the ProPublica site and examine how other news outlets brought subsets of data to their audiences. With that in mind, students should produce a data graphic that can tell a story relevant to a local audience.

Dataset of the week:
Andrea Ball, Eric Dexheimer, “Missed Signs. Fatal Consequences,” Austin American-Statesman, 2015
Child abuse and neglect fatality database, Texas Child Protective Services, Austin American-Statesman

 

Week 10: Special Topic 2: Economic and Business Data
Perhaps the first subfield of journalism to embrace data, economic and business reporting is full of numbers and figures. But it is also a field filled with confusing and highly specialized subject areas, where numbers require a lot of context for interpretation. This week looks at select topics reporters might encounter, from accounting and small business concerns to housing and trade.

Class 1:

Readings:

Activity:
Review the Journalist’s Resource tip sheet “Free tools for Visualizing Economic Data” and create a chart or graph using FRED and World Bank applications.

Class 2:

Readings:

Activity:
Investigate and chart local housing price trends over time. Use the application Plot.ly to make a chart or graph.

Dataset & data story of the week:
Find and review data relating to the intersection of business and politics at the Sunlight Foundation, OpenSecrets.org and FollowtheMoney.org.

 

Week 11: Special Topic 3: Crime and Public Safety Data
Among the most controversial areas of journalism — crime and criminal justice reporting — has attracted criticism from academic researchers for decades. News outlets’ tendency to hype violent crime and focus on episodic events can fuel public demands for all sorts of ill-advised policies. At the same time, journalists have also been accused of overlooking and ignoring important trends. This is difficult territory, and this week dives into some of these important issues.

Class 1:

Readings:

Activity:
Examine the ProPublica project “Documenting Hate,” which attempts to collect data on hate crimes across the U.S. in a deeper, more thorough way than the government does. Ask students to sketch out and prototype a project of their own that would use crowdsourcing and networking techniques to collect hard-to-get data of some other kind in the field of criminal justice.

Class 2:

Readings:

Activity:

Use the following custom tutorial for Tableau visualizations of homicide and exoneration data. Students should follow the steps and produce both the bar chart and the tree map explained in the tutorial. Discuss how using a tree map can help reporters explore data.

Dataset & data story of the week:
“Fatal Force” dataset and series, The Washington Post

 

Week 12: Frontiers: Algorithms, Data Science, Artificial Intelligence
New trends in the fields of data science, machine learning and artificial intelligence may radically change the way journalists approach quantitative information. It is perhaps too soon to tell. But this week provides a solid overview of emerging fields and their possible implications.

Class 1:

Readings:

Activity:
Students should familiarize themselves with the programming language R and how it is used in research and data journalism. Watch the following video: “FiveThirtyEight’s Data Journalism Workflow with R,” User 2016 Conference. Also, listen to this podcast: “Amanda Cox on Working With R, NYT Projects, Favorite Data,” Data Stories, 2016. Then walk through the tutorial “Getting Started with R in RStudio Notebooks” (Martin Frigaard, Storybench, 2016).

Class 2:

Readings:

Activity:
Continuing with their work in R, students should complete the tutorial “How to Create a Simple Line Chart in R” (Aleszu Bajak, Storybench, 2017).

Dataset & data story of the week:
Julia Angwin, Jeff Larson, Surya Mattu, Lauren Kirchner, “Machine Bias,” ProPublica, 2016
COMPAS Recidivism Algorithm dataset, ProPublica 

 

Week 13: Ethical issues in Data Journalism
Data journalism is an exciting field, but it carries with it substantial responsibilities for reporters and editors, as they are often making original interpretations of datasets for the public. This final week looks at some of the pitfalls and problems associated with the field and shares cautionary lessons.

Class 1:

Readings:

Activity:
Read “Connecting the Dots” by Jacob Harris (2015) and discuss how people should or should not be represented through news visualizations. Students should find examples of visualizations produced by news organizations that are exemplary and possibly problematic.

Class 2:

Readings:

Activity:
Read the following article from Alberto Cairo: “Data Journalism Needs to Up Its Own Standards” (Nieman Journalism Lab, July 2014.) In a blog post, students should respond to the critiques presented in the article and suggest ways data journalists can overcome the challenges articulated.

Dataset & data story of the week:
Review the data collection and analysis efforts of the Texas Tribune.

 

_______

A special thanks to John Wihbey, assistant professor at Northeastern University and a consultant to Journalist’s Resource, for his help preparing this syllabus.

About The Author