[R] Why software fails in scientific research

Sharpie chuck at sharpsteen.net
Mon Mar 1 01:23:24 CET 2010

John Maindonald wrote:
> I came across this notice of an upcoming webinar.   The issues identified
> in the 
> first paragraph below seem to me exactly those that the R project is
> designed
> to address.  The claim that "most research software is barely fit for
> purpose 
> compared to equivalent systems in the commercial world" seems to me not
> quite accurate!  Comments!
> A Crack in the Code: Why software fails in scientific research, and how to
> fix it. 
> Thursday, March 25, 2010, 3:00 PM GMT 
> http://physicsworld.com/cws/go/webinar9
> "
> In the 60 years since the invention of the digital computer, millions of
> lines of code have been developed to support scientific research. Although
> an increasingly important part of almost all research projects, most
> research software is barely fit for purpose compared to equivalent systems
> in the commercial world. The code is hard to understand or maintain,
> lacking documentation and version control, and is continually
> ‘re-invented’ as the code writers move on to new jobs. This represents a
> tremendous waste of the already inadequate resources that are put into its
> development. We will investigate how this situation has come about, why it
> is important to the future of research, and what can be done about it. 
> Robert McGreevy will draw on his extensive experience at the STFC ISIS
> Facility, and explain how these issues are being addressed for the benefit
> of research science globally. Nicholas Draper, consultant at Tessella,
> will then expand on this, using the example of the Mantid project at ISIS. 
> "
> Tessella (www.tessella.com) is a technology and consultancy firm, based in
> Oxford.
> ISIS (International Species Information System) (www.isis.org) has as its
> mission the facilitation of "international collaboration in the collection
> and sharing of knowledge on animals and their environments for zoos,
> aquariums and related organizationsvalues the use of objective data to
> benefit conservation, science, animal welfare, education, and collection
> management."
> John Maindonald             email: john.maindonald at anu.edu.au
> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm

I personally feel that a lot of this is a result of failing to publish the
code that was developed to perform research along with the results of the
research.  When setting out to do start a new project, one can dig up tons
of journal articles that will happily inform how data was gathered, what
equations were used and wrap it all up with nicely formatted tables and
graphs that show X is correlated to Y.

What these articles fail to report is the code that was developed to filter
and process the raw data and then apply the equations to produce the figures
and tables.  The next generation of researchers that are seeking to extend
the results then end up writing their own code rather than building upon
what has already been done.

The R community has done a tremendous job in encouraging truly reproducible
research through the package system and tools like Sweave which provide a
means to combine and maintain data, code and reports-- but we need more.

In my opinion, we need to start seeing websites that provide services
similar to github or bitbucket-- but with a focus on scientific research.  I
should be able to set up a versioned repository somewhere in the cloud for
my research projects that hosts not only my code, but my data and reports. 
I could then choose to make this resource publicly available and other
researchers could fork my work with a single mouse click and start
collaborating on my project or extend what I've done into a project of their

And that's my two cents on the state of software in research.

View this message in context: http://n4.nabble.com/Why-software-fails-in-scientific-research-tp1573062p1573068.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list