Expert Commentary

5 things journalists need to know about statistical significance

Statistical significance is a highly technical, nuanced mathematical concept. Journalists who cover academic research should have a basic understanding of what it represents and the controversy surrounding it.

Statistical significance research journalists should know
(Pete Linforth/Pixabay)

It’s easy to misunderstand and misuse one of the most common — and important — terms in academic research: statistical significance. We created this tip sheet to help journalists avoid some of the most common errors, which even trained researchers make sometimes.

When scholars analyze data, they look for patterns and relationships between and among the variables they’re studying. For example, they might look at data on playground accidents to figure out whether children with certain characteristics are more likely than others to suffer serious injuries. A high-quality statistical analysis will include separate calculations that researchers use to determine statistical significance, a form of evidence that indicates how consistent the data are with a research hypothesis.

Statistical significance is a highly technical, nuanced concept, but journalists covering research should have a basic understanding of what it represents. Health researchers Steven Tenny and Ibrahim Abdelgawad frame statistical significance like this: “In science, researchers can never prove any statement as there are infinite alternatives as to why the outcome may have occurred. They can only try to disprove a specific hypothesis.”

Researchers try to disprove what’s called the null hypothesis, which is “typically the inverse statement of the hypothesis,” Tenny and Abdelgawad write. Statistical significance indicates how inconsistent the data being examined are with the null hypothesis.

If researchers studying playground accidents hypothesize that children under 5 years old suffer more serious injuries than older kids, the null hypothesis could be there is no relationship between a child’s age and playground injuries. If a statistical analysis uncovers a relationship between the two variables and researchers determine that relationship to be statistically significant, the data are not consistent with the null hypothesis.

To be clear, statistical significance is evidence used to decide whether to reject or fail to reject the null hypothesis. Getting a statistically significant result doesn’t prove anything.

Here are some other things journalists should know about statistical significance before reporting on academic research:

1. In academic research, significant ≠ important.

Sometimes, journalists mistakenly assume that research findings described as “significant” are important or noteworthy — newsworthy. That’s typically not correct. To reiterate, when researchers call a result “statistically significant,” or simply “significant,” they’re indicating how consistent the data are with their research hypothesis.

It’s worth noting that a finding can be statistically significant but have little or no clinical or practical significance. Let’s say researchers conclude that a new drug drastically reduces tooth pain, but only for a few minutes. Or that students who complete an expensive tutoring program earn higher scores on the SAT college-entrance exam — but only two more points, on average. Although these findings might be significant in a mathematical sense, they’re not very meaningful in the real world.

2. Researchers can manipulate the process for gauging statistical significance.

Researchers use sophisticated software to analyze data. For each pattern or relationship detected in the data — for instance, one variable increases as another decreases — the software calculates what’s known as a probability value, or p-value.

P-values range from 0 to 1. If a p-value falls under a certain threshold, researchers deem the pattern or relationship statistically significant. If the p-value is greater than the cutoff, that pattern or relationship is not statistically significant. That’s why researchers hope for low p-values.

Generally speaking, p-values smaller than 0.05 are considered statistically significant.

“P-values are the gatekeepers of statistical significance,” science writer Regina Nuzzo, who’s also a statistics professor at Gallaudet University in Washington D.C., writes in her tip sheet, “Tips for Communicating Statistical Significance.”

She adds, “What’s most important to keep in mind? That we use p-values to alert us to surprising data results, not to give a final answer on anything.”

Journalists should understand that p-values are not the probability that the hypothesis is true. P-values also do not reflect the probability that the relationships in the data being studied are the result of chance. The American Statistical Association warns against repeating these and other errors in its “Statement on Statistical Significance and P-Values.”

And p-values can be manipulated. One form of manipulation is p-hacking, when a researcher “persistently analyzes the data, in different ways, until a statistically significant outcome is obtained,” explains psychiatrist Chittaranjan Andrade, a senior professor at the National Institute of Mental Health and Neurosciences in India, in a 2021 paper in The Journal of Clinical Psychiatry.

He adds that “the analysis stops either when a significant result is obtained or when the researcher runs out of options.”

P-hacking includes:

  • Halting a study or experiment to examine the data and then deciding whether to gather more.
  • Collecting data after a study or experiment is finished, with the goal of changing the result.
  • Putting off decisions that could influence calculations, such as whether to include outliers, until after the data has been analyzed.

As a real-world example, many news outlets reported on problems found in studies by Cornell University researcher Brian Wansink, who announced his retirement shortly after JAMA, the flagship journal of the American Medical Association, and two affiliated journals retracted six of his papers in 2018.

Stephanie Lee, a science reporter at BuzzFeed News, described emails between Wansink and his collaborators at the Cornell Food and Brand Lab showing they “discussed and even joked about exhaustively mining datasets for impressive-looking results.”

3. Researchers face intense pressure to produce statistically significant results.

Researchers build their careers largely on how often their work is published and the prestige of the academic journals that publish it. “‘Publish or perish’ is tattooed on the mind of every academic,” Ione Fine, a psychology professor at the University of Washington, and Alicia Shen, a doctoral student there, write in a March 2018 article in The Conversation. “Like it or loathe it, publishing in high-profile journals is the fast track to positions in prestigious universities with illustrious colleagues and lavish resources, celebrated awards and plentiful grant funding.”

Because academic journals often prioritize research with statistically significant results, researchers often focus their efforts in that direction. Multiple studies suggest journals are more likely to publish papers featuring statistically significant findings.

For example, a paper published in Science in 2014 finds “a strong relationship between the results of a study and whether it was published.” Of the 221 papers examined, about half were published. Only 20% of studies without statistically significant results were published.

The authors learned that most studies without statistically significant findings weren’t even written up, sometimes because researchers, predicting their results would not be published, abandoned their work.

“When researchers fail to find a statistically significant result, it’s often treated as exactly that — a failure,” science writer Jon Brock writes in a 2019 article for Nature Index. “Non-significant results are difficult to publish in scientific journals and, as a result, researchers often choose not to submit them for publication.”

4. Many people — even researchers — make errors when trying to explain statistical significance to a lay audience.

“With its many technicalities, significance testing is not inherently ready for public consumption,” Jeffrey Spence and David Stanley, associate professors of psychology at the University of Guelph in Canada, write in the journal Frontiers in Psychology.“Properly understanding technically correct definitions is challenging even for trained researchers, as it is well documented that statistical significance is frequently misunderstood and misinterpreted by researchers who rely on it.”

Spence and Stanley point out three common misinterpretations, which journalists should look out for and avoid. Statistical significance, they note, does not mean:

  • “There is a low probability that the result was due to chance.”
  • “There is less than a 5% chance that the null hypothesis is true.”
  • “There is a 95% chance of finding the same result in a replication.”

Spence and Stanley offer two suggestions for describing statistical significance. Although both are concise, many journalists (or their editors) might consider them too vague to use in news stories.

If all study results are significant, Spence and Stanley suggest writing either:

  • “All of the results were statistically significant (indicating that the true effects may not be zero).”
  • “All of the results were statistically significant (which suggests that there is reason to doubt that the true effects are zero).”

5. The academic community has debated for years whether to stop checking for and reporting statistical significance.

Scholars for decades have written about the problems associated with determining and reporting statistical significance. In 2019, the academic journal Nature published a letter, signed by more than 800 researchers and other professionals from fields that rely on statistical modelling, that called “for the entire concept of statistical significance to be abandoned.”

The same year, The American Statistician, a journal of the American Statistical Association, published “Statistical Inference in the 21st Century: A World Beyond p < 0.05” — a special edition featuring 43 papers dedicated to the issue. Many propose alternatives to using p-values and designated thresholds to test for statistical significance.

“As we venture down this path, we will begin to see fewer false alarms, fewer overlooked discoveries, and the development of more customized statistical strategies,” three researchers write in an editorial that appears on the front page of the issue. “Researchers will be free to communicate all their findings in all their glorious uncertainty, knowing their work is to be judged by the quality and effective communication of their science, and not by their p-values.

John Ioannidis, a Stanford Medicine professor and vice president of the Association of American Physicians, has argued against ditching the process. P-values and statistical significance can provide valuable information when used and interpreted correctly, he writes in a 2019 letter published in JAMA. He acknowledges improvements are needed — for example, better and “less gameable filters” for gauging significance. He also notes “the statistical numeracy of the scientific workforce requires improvement.”

Professors Deborah Mayo of Virginia Tech and David Hand of Imperial College London assert that “recent recommendations to replace, abandon, or retire statistical significance undermine a central function of statistics in science.” Researchers need, instead, to call out misuse and avoid it, they write in their May 2022 paper, “Statistical Significance and Its Critics: Practicing Damaging Science, or Damaging Scientific Practice?

“The fact that a tool can be misunderstood and misused is not a sufficient justification for discarding that tool,” they write.

Need more help interpreting research? Check out the “Know Your Research” section of our website. We provide tips and explainers on topics such as the peer-review process, covering scientific consensus and avoiding mistakes in news headlines about health and medical research.

The Journalist’s Resource would like to thank Ivan Oransky, who teaches medical journalism at New York University’s Carter Journalism Institute and is co-founder of Retraction Watch, and Regina Nuzzo, a science journalist and statistics professor at Gallaudet University, for reviewing this tip sheet and offering helpful feedback.

About The Author