How computers broke science – and what we can do to fix it

Reproducible is one of the corner stones of science. Made popular by the British scientist Robert Boyle in the the idea Is that a discovery shouldn’t be reproducible before being accepted as scientific knowledge.

In essence, you shouldn’t be able to produce the same results I did if you follow the method I describe when announcing my discovery in a scholarly publication. For example, if researchers can reproduce the effectiveness of a new drug at treating a disease, that’s a good sign it could work for all sufferers of the disease. If not, we’re left wondering what accident or mistake produced the original favorable result, and would doubt the drug’s usefulness.

TechFor most of the history of science, researchers have to reported Their methods in a Way That enabled independent reproduction of Their results. But, since the introduction of the personal computer – and the point-and-click software programs that have evolved to make it more user-friendly – reproducibility of much research has become questionable, if not impossible. Too much of the research process is now shrouded by the opaque use of computers That many researchers have come to depend are. This makes it almost impossible for an outsider to recreate Their results.

Recently, several groups have in the proposed similar solutions to this problem.Together they would break scientific data out of the black box of unrecorded computer manipulations so independent readers can again critically assess and reproduce results. Researchers, the Public, and science itself would benefit.

Computers wrangle the data, but Also obscure it

Statistician Victoria Sodden has described the unique place of personal computers hold in the history of science. They’re not just an instrument – like a telescope or microscope – that enables new research. The computer is revolutionary in a different way; It’s a tiny factory for Producing all kinds of new “scopes” to see new patterns in scientific data.

It’s hard to find a modern researcher who works without a computer, even in fields That are not intensely quantitative. Ecologists use computers to simulate the effect of disasters is animal populations. Biologists use computers to search massive amounts of DNA data. Astronomers use computers to control response of arrays of TELESCOPES, and then process the Collected Data. Oceanographers use computers to combine data from satellites, ships and buoys to predict global climates. Social scientists use computers to discover and predict the effects of policy or to analyze interview transcripts. Computers help researchers in almost every discipline Identify what’s interesting within Their data.

Also Computers tend to be personal instruments. Typically we have exclusive use of our own, and the files and folders it contains are Generally Considered a private space, hidden from public view. Preparing the data, analyzing it, visualizing the results – these tasks are done on the computer, in private. Only at the very end of the pipeline comes a Publicly visible journal article summarizing all the private tasks.

The problem Is that most modern science is so Complicated, and most journal articles so brief, it’s impossible for the article to include details of many notice methods and Decisions made by the researcher as they Analyzed his data on his computer. How, then, can another researcher judge the reliability of the results, or reproduce the analysis?

How much transparency do scientists owe?

Stanford statisticians Jonathan Buckwheat and David described this issue as early as 1995, when the personal computer was still a fairly new idea.

An article about computational science in a scientific publication is not the scholarship itself, it is Merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions Which generated on figures.

They make a radical Claim. It means All Those private files is our personal computers, and the private analysis tasks we do as we work toward preparing for publication shouldn’t be made public along with the journal article.

This would be a huge change in the way scientists work. We’d need to prepare from the start for everything we do on the computer to Eventually be made available for others to see. For many researchers, that’s an overwhelming thought. Victoria Sodden has found the Biggest objection to sharing files is the time it takes to prepare them by writing documentation and cleaning them up. The second Biggest concern is the risk of not receiving credit for the files if someone else uses them.

A new toolbox to Enhance reproducible

Recently, several different groups of scientists have converged there are recommendations for tools and methods to make it Easier to keep track of files and analyzes done on computers.These groups include biologists, Ecologists, nuclear engineers, neurosciences, political scientists. Manifesto-like paper slay out their recommendations. When researchers from such different fields converge on a common course of action, it’s a sign a major watershed in doing science might be under way.

One major recommendation: minimize and replace a point-and-click Procedures during data analysis as much as possible by using scripts That contain instructions for the computer to carry out. This solves the problem of recording ephemeral mouse movements That leave few traces, are Difficult to communicate to other people, and hard to automate. They’re common during the data cleaning and organizing tasks using a spreadsheet program like Microsoft Excel. A script, on the other hand, contains unambiguous instructions That can be read by its author far into the future (when the specific details have been forgotten) and by other researchers. It Can Also be included within a journal article, since they are not big files. And scripts can easily be adapted to automate research tasks, saving time and Reducing the potential for human error.

We can see examples of this in microbiology, ecology, political science and archaeology. Instead of mousing around menus and buttons, manually, editing cells in a spreadsheet and dragging files between several different software programs to Obtain results, These researchers wrote scripts. Their scripts automate the movement of files, the cleaning of the data, the statistical analysis, and the creation of graphs, figures and tables. This saves a lot of time when checking the analysis and redoing it to explore different options. And by looking at the code in the script file, Which Becomes part of the publication, anyone can see the exact steps That produced the published results.

Other recommendations include the use of common, Non proprietary file formats for storing files (such as CS, or comma separated variables, for tables of data) and simple rubrics for Systemically organizing files into folders to make it easy for others to understand how the information is structured. They recommend free software That is available for all computer systems (eh. Windows, Mac, and Linux) for analyzing and visualizing the data (such as R and Python). For collaboration, they recommend a free program called Git, that helps to track changes when many people are editing the same document.

Currently, These Are the tools and methods of the savant-grade, and many mid career and senior researchers have only a vague awareness of them. But many undergraduates are learning them now. Many graduate students, seeing a personal Advantages to getting organized, using open formats, free software and streamlined collaboration, are seeking out training and tools from volunteer organizations such as the Software Carpentry, Data Carpentry and propensity to fill the gaps in Their formal training. My university recently created an Science Institute, the where we help researchers Adopt These recommendations. Our institute is part of a bigger movement That includes similar Institutes at Berkeley and New York University.

As students learning These skills graduate and progress into positions of Influence, we’ll see These standards become the new normal in science. Scholarly journals will time require code and data files to accompany publications. Funding agencies will time require they be Placed in Publicly accessible online repositories.

Open formats and free software are a win / win

This change in the way researchers use computers will be beneficial for public engagement with science. As researchers become more comfortable sharing more of Their files and methods, members of the public will have much better access to scientific research. For example, a high school teacher will be able to show students the raw data from a recently published discovery and walk the students through the main parts of the analysis, Because all Of These files will be available with the journal article.

Similarly, as researchers increasingly use free software, members of the public will be able to use the same software to remix and extend results published in journal articles. Currently many researchers use expensive commercial software programs, the cost of Which makes them inaccessible to people outside of universities or large corporations.

Of course, the personal computer is not the sole cause of problems with reproducibility in science. Poor experimental design, inappropriate statistical methods, a highly competitive research environment and the high value Placed is a novelty and publication in high-profile journals are all to blame.

What’s unique about the role of the computer Is that we have a solution to the problem. We have clear recommendations for tools mature and well-tested methods borrowed from computer science research to Improve the reproducible of research done by any kind of scientist has a computer. With a small investment of time to learn These tools, we can help restore this cornerstone of science.