[R] Q: Suggestions for long-term data/program storage policy?

sosman sourceforge at metrak.com
Tue Oct 11 11:02:49 CEST 2005


Alexander Ploner wrote:
> Dear list,
> 
> we are a statistical/epidemiological departement that - after a few  
> years of rapid growth - finally is getting around to formulate a  
> general data storage and retention policy - mainly to ensure that we  
> can reproduce results from published papers/theses easier in the  
> future, but also with the hope that we get more synergy between  
> related projects.
> 
> We have formulated what we feel is a reasonable draft, requiring  
> basically that the raw data, all programs to create derived data  
> sets, and the analysis programs are stored and documented in a  
> uniform manner, regardless of the analysis software used. The minimum  
> data retention we are aiming for is 10 years, and the format for the  
> raw data is quite sane (either flat ASCII or real
> 
> Given the rapid devlopment cycle of R, this suggests that at the very  
> least all non-base packages used in the analysis are stored together  
> with each project. I have basically two questions:
> 
> 1) Are old R versions (binaries/sources) going to be available on  
> CRAN indefinitely?
> 
> 2) Is .RData a reasonable file format for long term storage?
> 
> I would also be very grateful for any other suggestions, comments or  
> links for setting up and implementing such a storage policy (R- 
> specific or otherwise).

I am coming more from a software development angle but you might want to 
take a look at subversion for versioning your projects.  For non-geeky 
types, TortoiseSVN has a point and click interface.

It handles binary files efficiently and you can easily go back and get 
earlier versions of your projects.

http://subversion.tigris.org/




More information about the R-help mailing list