[R] Staging area for data before read into R

Dr Eberhard W Lisse el at lisse.na
Wed Oct 22 07:32:03 CEST 2008


I fully agree, with important or large data sets you can not be
paranoid enough.

linux and the mac allow you to easily write scripts that handle
dumping, zipping, copying (locally and elsewhere) and verifying
the data. Once written correctly and tested they can run fully
automatic with cron. Been doing this for 15 years.


And where you are advised to burn 2 DVDs, burn 5 each. Read the data
on at least two different hardwares and operating systems.  Send at
least one of each by courier to a collaborating colleague on a
different continent.

As they say, different hard disk, differenr power supply, different
earthquake :-)-O

el

On 21 Oct 2008, at 21:18 , Ted Byers wrote:
>

> [...]

> Dr. Snow is right in recommending going the route of
> using an RDBMS and in saying that it isn't that hard to get  
> started.  I'd be
> recommending PostgreSQL, though, since it is relatively easy to use,  
> and it
> has pl/r (which lets you run R code within stored procedures in the  
> DB)
> which carries obvious advantages.

[...]

> If I were in his place, I'd say my data is sacred, and can not be  
> replaced
> (just as you can't step into the same stream twice); and therefore  
> I'd use a
> RDBMS to manage it, and the very moment it is all entered, I'd make  
> a backup
> of both the data (e.g. in MySQL I'd use mysqldump) AND the software,  
> and
> copy both backups to two CDs or DVDs.  And, if the data were  
> originally
> recorded on paper, I'd be scanning the pages and copying those  
> images onto a
> couple CDs or DVDs also: with two copies on optical media, one copy  
> can be
> stored in a fireproof vault while the other is in the office ready  
> to be
> used should a HDD fail, or some other disaster interrupt my work.   
> OK, so
> I'm paranoid about my data, but I'd rather go the extra mile than risk
> losing it.




--
Dr. Eberhard W. Lisse  \        / Obstetrician & Gynaecologist (Saar)
el at lisse.NA el108-ARIN /   *   |   Telephone: +264 81 124 6733 (cell)
PO Box 8421             \     / Please send DNS/NA-NiC related e-mail
Bachbrecht, Namibia     ;____/             to dns-admin at na-nic.com.na



More information about the R-help mailing list