Massive and growing volume of free research on the Web: 27 million documents and counting

Distribution of open-access research (PLoS)
Open-access research (PLoS)

Just a decade or so ago, most of the fruits of academia were relegated to library shelves and narrow circles of specialists — journals and the studies within them were placed a long way away from the immediate reach of the public and the media.

Now, with the rise of the Web and powerful search engines such as Google Scholar and Microsoft Academic Search, more and more of the world’s deepest knowledge is made accessible, at least in summary form, to a global audience. And through the open access movement, more scholarship is open to the public, without pay walls, through institutions such as the Social Science Research Network (SSRN), the Public Library of Science (PLoS), Harvard’s DASH database and MIT’s DSpace. The PubMed database indexes more than 20 million health studies, an increasing percentage of which are made open by government mandate. This means that more data and research insights are now made “deadline friendly” to anyone doing public communications on the Web, conducting business or exploring personal questions around health, for example. This sea change in the availability of empirical knowledge and research data opens vast possibilities that have yet to be fully explored.

A 2013 European Commission report found that, among new papers being published, perhaps half are now free. Further, a 2014 study published in PLoS One, “The Number of Scholarly Documents on the Public Web,” has used computer science techniques to estimate the total amount of research knowledge available on the Web. Authors Madian Khabsa and C. Lee Giles of Penn State study the availability of studies through Google Scholar and Microsoft Academic Search to arrive at an estimate. Included among the category of scholarly documents, according to the authors, were “journal and conference papers, dissertations and masters theses, books, technical reports and working papers.”

The study’s findings include:

  • As of 2013, when the scientists used algorithms to make their estimates, there were at least 114 million English-language studies available on the Web.
  • Of these 114 million, 27 million were open access — meaning that about one-quarter of online research knowledge in the English-speaking world is now free to the public on the Web.
  • There were significant differences, however, in the availability of papers across disciplines. Perhaps surprisingly, some of the disciplines connected with the most profitable industries had the highest percentages of open papers: 50% for computer science; 42% for business and economics; 35% each for geosciences and physics. (It is also true that material and agricultural sciences and engineering all were estimated to have only 12% of their papers open to the public.)
  • By contrast, only 19% of social science studies were found to be open access.

The authors note that academic research, open access or not, is not uniformly available on all search engines. For example, at the time of the study, Google Scholar indexed approximately 100 million of the 114 million studies available on the Web — 87%. This being the case, they write, “it would be useful for researchers to consider as a standard practice querying multiple databases and academic search engines in order to gain the most comprehensive result for their query.”

Related: For those interested in engaging more with the world of scholarly research see the following tip sheets: “Interpreting Academic Studies” and “Statistical Terms Used in Research Studies.”

Keywords: research, open access

Last updated: October 15, 2014


We welcome feedback. Please contact us here.

Citation: Khabsa, Madian; Giles, C. Lee. “The Number of Scholarly Documents on the Public Web,” PLoS One, May 2014. doi: 10.1371/journal.pone.0093949.