In the age of the Internet, why is so much research inaccessible?
On January 6, 2011, 24-year-old hacker and activist Aaron Swartz was arrested by police near the Massachusetts Institute of Technology for downloading several million articles from an online archive of research journals called JSTOR. After Swartz committed suicide in January 2013, questions were raised about why MIT, whose access to JSTOR he exploited, chose to pursue charges, and about what motivated the U.S. Department of Justice to demand jail time for his transgression.
But the question that should have been asked is why, decades after the birth of the Internet, it is a felony to download works that academics chose to share with the world. The Internet, after all, was invented so that scientists could communicate their research results. But while you can now get immediate, free access to millions of cat videos, scholarly literature, one of the greatest public works projects of all time, remains locked behind expensive pay walls.
Every year, universities, governments, and other organizations spend $10 billion or more to buy back access to papers their researchers have given to journals for free, while most teachers, students, health care providers, and members of the public are left out in the cold. Even worse, the journals’ stranglehold on academic publishing means that in an era when anyone can in an instant share anything with the entire world at the click of a button, it takes a typical paper nine months to be published. These delays slow progress and in many cases literally cost lives.
Scholarly journals as we know them are a product of the 19th century: Science, Nature, The New England Journal of Medicine, The Journal of the American Medical Association, and The Lancet all published their first editions in the 1800s.
The journals were enabled by the technologies of the industrial revolution—steam-powered rotary printing presses and efficient rail-based mail service. But printing and shipping articles was expensive, and because of this, the key features of modern journals were established: They limited what they printed to those works deemed to be of the greatest interest to the journal’s target audience; and they sold subscriptions, sending copies only to those who had paid.
By 1990 there were about 5,000 journals in circulation, and the costs were skyrocketing. If you were lucky enough to be at a major research university, you could find most of them in the library. But most scientists had to make do with a small subset.
Then along came the Internet.
Scientific journals, serving a computer-savvy audience who had access to fast Internet connections, were among the first commercial ventures to take advantage of the new technology. Within a few years virtually all major publishers put versions of their printed journals online.
Rather than adapting their business model to the new medium, however, journal publishers stuck with the same subscription-based print model. And why not—scientists were still giving them papers, and universities were still buying them back.
To understand just how crazy this system is, consider a typical Berkeley scientist. The state pays her salary and provides her workspace. When she has a new idea, she raises money for the equipment, supplies, and staff who will do the work. In all likelihood this money comes from government agencies such as the National Institutes of Health and the National Science Foundation. This scientist and her students then spend a great deal of time, usually years, pursuing the idea until they finally have a result they want to share.
So they sit down and write a paper describing what they did, how they did it, what they found, and what they think it means. And they submit it to one of the more than 11,000 journals currently in operation, choosing based on the journal’s scope and perceived importance.
With few exceptions, these journals all work the same way. The paper is assigned to an editor—usually a practicing scientist volunteering time—who sends it to other volunteer scientists deemed best qualified to evaluate the author’s methods, data, and conclusions. These reviewers render their opinion on its technical merits and suitability to the journal in question. The editor looks at all the reviews and decides whether to accept, modify, or reject the work. If the paper is accepted, the journal converts the manuscript into publishable form and posts it on the Web. If not, the scientists send their work to another journal, thus reprising the entire process.
Note just how little the journal does. It provides the infrastructure for peer review, oversees the process, and prepares the paper for publication. This is a tangible contribution, but it pales in comparison to the labors of the scientists and the support from the research’s funders and sponsors.
And yet, for their modest role, publishers are rewarded with ownership—in the form of copyright—of the finished, published work, which they turn around and lease back to the same institutions that produced it. It would be funny if it weren’t so tragically insane.
And the consequences are severe. Many physicians and health care providers lack access to basic medical research, as do students and teachers at high schools and small colleges who must resort to textbooks or Wikipedia rather than the primary research literature. Perhaps worst of all, patients lack access to the highest quality research on their condition—research their taxes have paid for.
It doesn’t have to be this way. In the 1990s, several people began promoting a simple alternative model for science publishing. The idea was to treat it as a service, with publishers getting paid a fee for their services, after which the finished product would effectively enter the public domain. One of the people pushing this new model, now known as “open access,” was my postdoctoral advisor at Stanford, Pat Brown, who enlisted me in his crusade. After failing to convince existing publishers to adopt this model, the two of us, along with former NIH director Harold Varmus, launched a nonprofit publisher that we dubbed the Public Library of Science, or PLOS. We were determined to prove that this model would work.
After all, universities were already forking over billions of dollars to support publishers. We were offering them a better deal: access for everyone, at a lower price. But even with logic and value on our side, only a small group of pioneers joined us.
Why? Academia is an industry of prestige, and the currency in which prestige is traded is journal titles. In most scientists’ minds, publication in an elite journal such as Nature or Science is as good as gold—a ticket to a job, grants, and tenure.
With that in mind, PLOS launched with two journals that adopted the same elitist editorial policies of Science, Nature, and their ilk: PLoS Biology for basic life sciences and PLoS Medicine for the clinical world. We hired professional editors, built fancy editorial boards, and had a suite of Nobel Prize winners singing our praises.
But prestige is a difficult thing to engineer, and colleagues continued to send their high-profile papers to the same old subscription journals. When I suggested they were chicken, they would complain that their jobs—or their trainees’ jobs—were at stake. I didn’t think they were right, but I didn’t have a lot of evidence to show. While we were launching PLOS, I was starting my own lab at Berkeley. Senior colleagues, knowing about my extracurricular activities, warned me that I would never get grants or tenure unless I published my work in the old-guard journals, and that I would ruin my trainees’ careers if I put my principles ahead of practical realities.
I didn’t want to believe them. I wanted to believe that good work would get noticed, that success in science did not require capitulating to stupid traditions. I also knew I’d look like a hypocrite if I failed to live up to my own exhortations.
So I vowed to make all my studies freely available. I have stuck to my pledge. And you know what? The sky didn’t fall. I got grants. Then I got a tenure-track job at Berkeley (I started out at the Lab up the hill). Then I got tenure. And then I was named an investigator with the Howard Hughes Medical Institute, a coveted award that now funds most of my research. The people in my lab have not suffered. My grad students received fellowships and went on to land plum postdoctoral positions (except the one who went to Facebook and is now a millionaire), and my postdocs attained faculty positions at good schools. Sadly, few colleagues followed my lead.
Fortunately, publishing decisions are not entirely in the hands of individual investigators. In 2008, under pressure from Congress, the NIH—which funds about $30 billion of research every year—implemented a policy requiring that grantees make their work available through the National Library of Medicine. For perhaps the first time, a major funding agency made it a condition of receiving a grant that authors make their works available to the public. The policy has been successful: 80 percent of NIH-funded works published in 2011 are now freely available online. There’s nothing like the threat of losing funding to get people to do the right thing.
Unfortunately, under heavy lobbying pressure from publishers, the NIH policy allows for up to a year delay from publication to free access. Though better than nothing, delayed access to the literature no more provides the public with the latest advances in biomedical research than handing out year-old copies of The New York Times keeps everyone up to date on world events.
Earlier this year, the Obama administration weighed in on the matter, directing other federal agencies that fund large amounts of research to develop their own public access policies. Unfortunately, if predictably, the new policy all but enshrined the NIH’s one-year delay, explicitly citing as an excuse the need to sustain subscription-based publishing business.
But at least the White House did something. The other major player in this arena—the universities—have been mostly silent. Like funding agencies, universities could hasten the transition to full and immediate open access by making it a condition of employment. It would lower costs and make the research done on campuses more efficient and effective. The universities’ failure to do so is an astonishing abdication of their public mission and responsibility as stewards of scholarship.
The situation is changing. In 2006, PLOS ONE was launched. It not only provided open access to all content, but also dispensed with the notion that journals should select only papers of the highest level of interest to their readers. PLOS ONE asks its reviewers only to assess whether the paper is a legitimate work of science. If it is, it is published. The process is relatively simple; there’s no need to ping-pong from one journal to another in order to find the highest-impact home.
This idea evidently appeals to the scientific community. PLOS ONE will publish in excess of 25,000 articles this year and, though only six years old, is now the biggest biomedical research journal in the world. It publishes great science—PLOS ONE articles are routinely discussed both by science journalists and the popular press. And PLOS ONE is turning a profit, a fact that has attracted the eye of commercial and nonprofit publishers worldwide.
Being able to access papers is just the beginning. We can now finally start to take advantage of computers and the Internet not just to make scientific publishing open, but to make it better. The multilayered, hyperlinked structure of the Web was made for scientific communication, yet papers today are largely dispersed and read as static PDFs. We are working with the community to enable the “paper of the future” that embeds not only things like movies, but access to raw data and the tools used to analyze that data.
There is also no need for papers to be static works, fixed in a single form at their time of publication. Good data and good ideas in science are constantly evolving, and scientific papers should evolve over time as new data, analyses, and ideas emerge—whether they support or refute the original assertions.
But the biggest target of our efforts is peer review—the closest thing science has to a religious doctrine. Attempts to upend, reform, or even tinker with peer review are regarded as apostasies. The truth is that peer review as practiced in the 21st century poisons science. It is conservative, cumbersome, capricious, and intrusive. It encourages groupthink, slows down the communication of new ideas and discoveries, and cedes undue power to a handful of journals, which then stand as gatekeepers to success in the field.
Each round of reviews takes a month or more, and it is rare for papers to be accepted without additional experiments, analyses, and rewrites, which take months or years to accomplish. The delay might be worth it if it made the ultimate product better. But it doesn’t. Peer-reviewed literature is filled with all manner of crappy papers. Even the supposedly more rigorous standards of the elite journals fail to prevent flawed papers from appearing in their pages. The flaws are revealed later. But as it stands, we have no effective way to annotate previously published papers that have turned out to be wrong.
As for classification, does anyone really think that assigning every paper to one of 11,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is the best way to help people browse the literature? This is a relic of a bygone era, an artifact of the historical accident that Gutenberg invented the printing press before Al Gore “invented” the Internet.
So what would be better? The outlines of an ideal system are simple. There should be no journal hierarchy, only broad journals like PLOS ONE. When papers are submitted, they should immediately be made available online for free—clearly marked as not yet reviewed, but there to be used by people in the field who are capable of deciding on their own whether the work is sound and important.
The journal would then organize a different type of peer review, in which experts are asked about a paper’s technical soundness—as we currently do at PLOS ONE—and about its appropriate scientific audience and its relative importance. This assessment would then be attached to the paper, there for everyone to see.
This process would capture all the value of the current peer-review system while shedding most of its flaws. By replacing the current journal hierarchy with a structured classification of research areas and levels of interest, the new system would undermine the generally poisonous “winner take all” attitude associated with publication in Science, Nature, and their ilk. And by devaluing assessment made at the time of publication, it would facilitate the development of a robust system of postpublication peer review in which individuals or groups could submit their own assessments of papers at any point after they were published. Papers could be updated to respond to comments or to new information, and we would finally make the published scientific literature as dynamic as science itself.
If we all do this, then maybe the next time someone like Aaron Swartz tries to access every scientific paper ever written, instead of finding the FBI, they’ll find a big green button that says Download Now.
Michael Eisen is a Berkeley biologist and an Investigator of the Howard Hughes Medical Institute. He delivered a version of the above to the Commonwealth Club in March.