Expert Commentary

Researchers use decades of Wall Street Journal articles to predict stock market returns

Culling decades of Wall Street Journal articles, researchers created a new way to gauge stock performance that matches well to real returns.

wall street journal
(Philip Strong / Unsplash)

Financial news articles can be a good short-term indicator of why the U.S. stock market is doing well or poorly, finds a new working paper, “Business News and Business Cycles,” from the National Bureau of Economic Research.

Based on a full-text analysis of 763,887 Wall Street Journal articles published from 1984 to 2017, the authors find that news coverage of particular topics, like signs of a looming recession, predicts 25% of average fluctuations in stock market returns.

The data represent “among the most extensive text corpora of business news studied in the economics literature to date,” the authors write, adding that their approach is “motivated by the view that news text is a mirror of the state of the economy.”

Stock markets operate like any other market. Prices are determined by supply and demand. For individual companies, large price swings can happen for intuitive, obvious reasons. If company executives are caught in a scandal, investors devalue that company’s stock and sell it off. When demand drops, so does price.

But, outside of major news affecting an individual company or even an entire industry, “trying to understand what’s going on in the economy at any given time is a really central problem for our field, and having good measurements of that is really valuable,” says Yale University doctoral student Leland Bybee, one of the paper’s authors along with professors Bryan Kelly at Yale, Asaf Manela at Washington University in St. Louis and Dacheng Xiu at the University of Chicago.

Imagine being asked to predict overall S&P 500 gains and losses over time. Your only information to make those guesses comes from Wall Street Journal articles. Also, you are a computer, so you can read decades of news stories in seconds.

Editors and reporters at news outlets often decide what topics to cover based on the interests of their core readers. Many of the Journal’s core readers are investors or people generally interested in economic affairs. Founded in 1889, the Journal is regarded as a major paper of record for national financial news.

Journal reporters also have sources — economists, analysts, business owners, workers — who provide on-the-ground, real-time insights on what’s happening in the U.S. economy. In short, reporters have access to information their readers want to know.

That information is conveyed through written news.

The authors organized the Journal articles by topic, then predicted what aggregate S&P 500 returns would look like based on those topics the Journal was covering. Coverage of economic events that might affect market returns, like recessions, fluctuate over time. When the economy is doing well, fewer stories use the word “recession.” An uptick in recession-related stories would, for example, lead their model to predict lower overall S&P 500 returns.

Each day, the 505 publicly traded firms that make up the S&P index gain or lose value, or stay roughly the same. For their 23-year sample of news articles, the authors compared their predictions of monthly S&P 500 returns with actual S&P monthly returns. Across all months in their sample, predictions based on the Journal articles amount to one-quarter of the actual returns, on average.

That makes Journal coverage a stronger indicator and potential short-term predictor of market performance than even certain federal macroeconomic data, according to the authors.

Simply put, “a big part of why the market goes up or down is captured by things being discussed in the Wall Street Journal,” Bybee says. He adds that the quality of the paper’s results is directly related to the quality of the journalism underlying the data.

“In order to get this really good measure of the state of the economy, it needs to be the case that journalists find the information that matters,” he says.

The authors identified 180 topic areas across the Journal articles, excluding non-economics topics, like sports, leisure and the arts. This particular number of topics seemed to hit the right note. Consider the topic of executive pay. The authors found an analytical model with only 50 topics captured articles unrelated to executive pay, such as the drop in flights after terrorist events. The model with 180 topics, “achieves a successful separation of distinct subjects,” they write. 

The topics reveal patterns of news judgment decisions by Journal editors and reporters. Recession and health insurance are intensely covered during certain periods, and covered relatively rarely during other periods. Stories about health insurance, for example, peaked around President Bill Clinton’s September 1993 speech to Congress on overhauling the nation’s health care system. Health insurance news coverage also spiked around the Affordable Care Act debate from 2008 to 2010, and Republican rhetoric about repealing Obamacare during the 2016 presidential race.

The “elections” topic, by contrast, shows a regular, seasonal pattern, with coverage ramping up and spiking during presidential races. Likewise, stories on “earnings forecasts” jump prior to company earnings announcements and conference calls with analysts, which happen more or less regularly, roughly every three months.

News attention is also a strong predictor of other important macroeconomic measures, the authors find. The Journal publishing more or fewer stories about a recession, for example, strongly predicts industrial production and employment outcomes, “more so than pretty much any other quantitative measure out there,” Bybee says. Same for stories about global oil markets, though those are not as strong a predictor of production and employment as recession-related articles. Increases in stories related to small businesses are linked to less overall market volatility.

The dataset the authors created based on Journal article text is the most comprehensive of its kind. If a body of data explains a portion of stock market volatility, that data could be used to predict future volatility. “The maturity of a science is often gauged by its success in predicting important phenomena,” UCLA applied finance professor emeritus Richard Roll wrote in a 1988 paper, titled “R2,” in The Journal of Finance.

Roll wrote that paper when computational power was a fraction of what it is today and the concept of accessing information on the internet was nonexistent for most people. Still, he tracked mentions of 96 large firms in Journal articles and the Dow Jones news wire from 1982 to 1986. Roll looked at dates those firms weren’t mentioned in the news and added that data to a larger predictive model incorporating firm size, industry and other factors. The news — or lack of it — didn’t help explain market volatility in Roll’s model.

Manela, one of the current paper’s authors, co-wrote a paper in 2017 based on a text analysis of Journal news over a longer period, from 1889 to 2009. Those authors found that news about economic volatility, “predicts high future returns in normal times and rises just before transitions into economic disasters.” But, that database incorporated only abstracts of front-page articles.

“This is one of the first attempts to really quantify the data in the way that we’ve done,” Bybee says. A real-time model incorporating past and current Journal articles, as well as stories from other financial news outlets, TV, radio, social media and alternative news “would be the Holy Grail,” he adds.

Explore visualizations of the Wall Street Journal’s historical news judgment and attention to various economics topics — plus download the data — at structureofnews.com.

About The Author