If you’re a journalist, you might feel more comfortable with words than numbers. If you’re reading this, you might also be interested in research, which, more often than not, involves math — usually statistics. One of the more important statistical concepts used in interpreting research is effect size, a measure of the strength of an association between two variables — say, an intervention to encourage exercise and the study outcome of blood pressure reduction.
Knowing the effect size will help you gauge whether a study is worth covering. It also will help you explain to audiences what the study’s findings mean. To make effect size as easy to understand as possible, we spoke with Regina Nuzzo, who is the senior advisor for statistics communication and media innovation at the American Statistical Association as well as a freelance science writer and professor of statistics at Gallaudet University.
Researchers can be an invaluable resource in interpreting their study’s key findings. Start by simply asking about effect size: “I think just talking about effect size is a huge win and great for journalists to do,” Nuzzo says. “We [journalists] have not been trained to do that, and some researchers make it hard to do that, and many articles and press releases don’t do it, so you might have to work for it. But it’s so important and rewarding.”
Nuzzo explains that the term effect size can be misleading: it doesn’t actually tell you anything about cause and effect. And effect size alone can’t tell you whether findings are important or newsworthy.
For more on what effect size is, and isn’t, read Nuzzo’s five tips on understanding and interpreting effect size.
Tip #1: Look for the different terms researchers use to describe effect sizes.
“Most of the time in an article… they’re not going to put a big highlight at the top saying, ‘effect size here and here,’” Nuzzo says. “You have to go hunting.”
With that in mind, here are terms that signal “effect size here!”
Some you might recognize: correlation, odds ratio, risk ratio (a.k.a. relative risk), hazard ratio, mean (average) difference.
Some you might be less familiar with: Cohen’s d, Eta-squared, Cramer’s V, R-squared.
Nuzzo offers the following guidance on interpreting the more common types of effect sizes you’ll encounter:
Risk ratio: This is a ratio of the probability of a certain outcome happening in two different groups. For example, suppose a study looked at the incidence of heart attacks in night shift-work nurses compared with nurses who work regular day shifts. To get the risk ratio (RR), which tells you the effect size of night shifts on heart attacks, you take the probability that a night-shift nurse had a heart attack and divide it by the probability a day-shift nurse had a heart attack.
RR = probability of outcome in group A / probability of outcome in group B
“Since it’s a ratio, we can have three different possibilities,” Nuzzo adds. “It can be equal to one, bigger than one, or smaller than one. But it can never be negative.”
- If the RR is equal to 1, that means the risk of a heart attack is the same in both groups
- If the RR is greater than 1, that means the risk of a heart attack is greater in night-shift workers than day workers
- If the RR is less than 1, that means the heart attack risk is lower in night-shift workers than day workers.
It’s not too difficult to translate a risk ratio into statistics you can use.
- If the risk ratio is greater than 1: then the difference between the risk ratio and 1 (Subtract 1 from RR) represents the how much higher the risk of an outcome is for group A compared with B.
- For example: RR = 1.5 → 1.5 – 1 = 0.5 → The risk of heart attack is 50% higher in night-shift workers than in regular day-shift workers.
- If the risk ratio is less than 1, then the difference between the risk ratio and 1 (Subtract 1 from RR) represents how much lower the risk of an outcome is for group A compared with B
- For example: RR = 0.75 → 1 – 0.75 = 0.25 → The heart attack risk is 25% lower in night-shift workers than in day workers
- You can also flip the risk ratio — just divide 1 by the risk ratio. So a risk ratio of 0.75 for night workers versus day workers is equivalent to a risk ratio of 1.33 (= 1/0.75) for day workers versus night workers. That would mean that the heart attack risk is 33% higher for day workers than night workers.
Nuzzo adds that it’s helpful to mention absolute risk along with relative risk. If the absolute risk of a certain outcome occurring is very low, that can help contextualize the reduction or increase you see in terms of relative risk.
Odds ratio: This is the same as the risk ratio, with a slight difference in how the probability of an outcome is measured. It uses odds, which is a ratio of probabilities; think of a coin toss — the odds of getting heads is 1:1, or a 50% chance of happening. If something has a 25% chance of happening, the odds are 1:3.
You interpret an odds ratio the same way you interpret a risk ratio. An odds ratio of 1.5 means the odds of the outcome in group A happening are one and a half times the odds of the outcome happening in group B.
Hazard ratio: A hazard ratio (HR) is an annual risk of death (or some other outcome, e.g., cancer recurrence, heart attack) over a specific period, Nuzzo explains. The period of time being studied is important, because everyone has a 100% chance of dying at some point in their lives.
Here’s how you’d translate the following hypothetical example into plain language: If you’re looking at a study analyzing daily meat consumption and risk of death over a 20-year time frame, and the hazard ratio is 1.13 for people who eat red meat every day compared with vegetarians, that means that meat eaters have a 13% increased yearly risk of death over the 20-year study period compared with vegetarians. (We got to this percentage the same way we did for risk ratios.)
But what does a 13% increased yearly risk of death over 20 years really mean? Here’s how to calculate the probability a person in the daily meat-eating group will die before a person in the vegetarian group: HR / (1 + HR)
So in this case, you’d do the following: 1.13/(1 + 1.13) = 0.53
That means there’s a 53% chance that a person who eats red meat every day will die before someone who doesn’t eat red meat at all.
As a quick comparison, you can calculate the probability as though eating red meat had no effect (HR = 1). 1/(1+1) = 0.5. In this case, the chances the meat lover will die before the meat-abstainer is 50% — essentially a heads-or-tails flip.
Tip #2: Put effect size into context.
There are guidelines for what statisticians consider small, medium and large effect sizes. For an effect size called Cohen’s d, for example, the threshold for small is a 0.2, medium is a 0.5, and large is a 0.8.)
But what do small, medium and large really mean in terms of effect size? We might have different frames of reference that we use to interpret these terms. Luckily, statisticians have come up with ways to translate Cohen’s d into what’s called “common language effect size,” as well as effects we can visualize or understand more intuitively. Suppose, for example, that a researcher measures the difference in verbal fluency between teenage boys and teenage girls in a certain neighborhood, and she gets a Cohen’s d of 0.9, which is considered a large effect. You can look up this effect size in a table to find that it translates to a 74% chance that any randomly chosen teenage girl in that neighborhood would be more verbally fluent than any randomly chosen teenage boy in the neighborhood.
Tip #3: Don’t assume effect size indicates causality.
Put simply, effect size cannot prove causation between two variables — that one caused the other to change in some way. It’s just a measure of the strength of the relationship between two things. In general, the larger the effect size, the stronger the relationship. But effect size alone can’t tell you if there’s a causal link between the variables being studied. For example, let’s say a study found that the correlation between leafy vegetable intake and improved sleep quality in children has a large effect size. That doesn’t mean leafy greens cause much better sleep. It just indicates that children who ate a lot of leafy greens had much higher sleep quality than those whose diets were low in greens.
Tip #4: Don’t confuse effect size with statistical significance.
If a result is found to be statistically significant, it’s unlikely to be a chance occurrence. Statistical significance is often understood in terms of p-values. We explained it as follows in an earlier tip sheet:
“P-values quantify the consistency of the collected data as compared to a default model which assumes no relationship between the variables in question. If the data is consistent with the default model, a high p-value will result. A low p-value indicates that the data provides strong evidence against the default model. In this case, researchers can accept their alternative, or experimental, model. A result with a p-value of less than 0.05 often triggers this.
Lower p-values can indicate statistically significant results, but they don’t provide definitive proof of the veracity of a finding. P-values are not infallible — they cannot indicate whether seemingly statistically significant findings are actually the result of chance or systematic errors. Further, the results of a study with a low p-value might not be generalizable to a broader population.”
Just because a result has a small p-value — indicating it’s probably not due purely to chance — does not mean there is a strong relationship between the variables being studied. The effect size reflects the magnitude of the finding.
“You can have something that’s really statistically significant — a tiny p-value — but it also has a tiny effect size. Then you can say, ‘so what?’” Nuzzo says. “It’s really important to look at that effect size — and researchers don’t always brag about it, because sometimes it’s really small. They’d rather say, ‘Oh, but look, my p-value is really good,’ and avoid the fact that, okay, it’s a ‘who cares’ effect size.”
For example, a study might find that the average marital happiness in people who meet on a dating app is higher than the average marital happiness in those who meet in a bar, with a p-value of 0.001, which indicates the finding is statistically significant. But look closer and you might find that the average difference in happiness between the two groups is only 0.2 points on a seven-point scale. This small effect size might not have practical importance.
Tip #5: Remember that effect size is not the sole indictor of a study’s importance or newsworthiness.
The things you should be looking for besides the effect size are:
- How the effect size compares to others in a particular field. In educational or psychological studies, small effect sizes might be the norm, because of the difficulties associated with trying to measure or change behavior. On the other hand, randomized trials in medicine commonly have bigger effect sizes, because drugs tested in tightly controlled settings can have large effects. “It’s important to look at it in the context of that field,” Nuzzo says. “I think journalists can definitely push researchers and say, ‘Okay, what are other effect sizes in this field, in this particular area?’” She also suggests asking the researchers questions that bring the findings closer to the individual-level, including: “What percent of the sample did this treatment actually work for? What percent did it not help at all? What percent got worse?”
- Whether the result is unexpected. “If it goes against everything that we know about theory or past experience, it might be telling us something really cool and really unexpected, so it’s fine if it’s a small effect size,” Nuzzo adds.
- If a study focuses on an intervention, the cost of the intervention might be noteworthy. “Maybe it’s a small effect, but this is a super cheap and simple and easy intervention, so it’s worth writing about,” she says.
For more guidance on reporting on academic research, check out our tips for deciding whether a medical study is newsworthy.
Expert Commentary