[BioC] How do you keep track of your analyses

Sean Davis sdavis2 at mail.nih.gov
Tue Sep 23 18:05:26 CEST 2008


On Tue, Sep 23, 2008 at 11:51 AM, Daniel Brewer <daniel.brewer at icr.ac.uk> wrote:
> Hello,
>
> I am doing an increasing number of bioconductor analyses for various
> people and I am starting to find it difficult to keep track of what I
> have done previously.  A common question six months after the initial
> analysis is something like "Can you do the same as x but change y".  Has
> anyone got any idea on the best way to do this.
>
> The essential components to keep track of are:
> * input files
> * R code used
> * output files
> * Description of what you aim to do.
>
> The two possibilities that I can think of is:
> 1) Some structured directories e.g.
> ProjectName_Person
>        /Description.txt
>        /Analysis1_date
>                /InputFiles
>                /Rcode
>                /Output/Outputfiles

I store all the raw data and sample information in a directory called
"Results", R code in a directory called "R" with subdirectories for
figures and output (textual).  I use ESS/emacs for everything, so it
is easy to maintain a "script" file that has every R command that I
use contained in it.  If I need to rerun an analysis, I can do so by
simply running the entire script.  However, with ESS/emacs, I can
easily submit just the pieces that I need if doing a subset of the
original analysis.

Another option is to create sweave documents for each project.  With
Seth Falcon's weaver, it becomes possible to do this for larger
projects because caching can be employed.  Also, if you need to run a
subset of analysis for a quick check, you can always pull out the
relevant R code pretty easily.

A step that I have not taken is to version-control the R scripts for
projects.  I do version-control all common R code that I use, however.
 Using something git or svn is VERY helpful for anyone doing any
amount of coding.

> 2) Some sort of personal wiki like TiddlyWiki
>
> Its got to be searchable in some form too.

I so use a wiki at times for highly collaborative, complicated,
long-term projects where communication is key.  However, I find it a
bit tedious for the typical gene expression study where the task can
be done quickly.

Obviously, just my $0.02 worth.

Sean



More information about the Bioconductor mailing list