[R] [OT] Advice for medium size data management

Marco Barbàra jabbba at gmail.com
Fri Mar 7 16:46:56 CET 2014


Dear UseRs,

I am going to be involved in the analysis of a cohort of about 90,000
people. I still didn't have the data at hand, but I know that right now
they are archived into spreadsheet files. So far I only analysed data
sets of very small size. I probably will be able to work on a
relatively fast pc, an i7 with 8 or (i hope) 16 GB RAM. I don't know
the number of variables but I think I shouldn't have the need to use
other than "standard" R  (i.e. holding the entire data frame in RAM)
evev if I probably will have to use some non-parametric tools which
should be a bit more computer-intensive.

Still, since I have no previous experience, it'd be of great help if
someone could give me some advice on which ways could be most
convenient to work in, both from the point of you of databases and of
data access, or otherwise if there is simply no reason for me to bother
at all.

I'm not asking for prepackaged solutions, rather for help in
documentation seeking and links to useful documentation or other
threads (for example: is it worthwhile using parallel computing?)

Thank you to anyone for reading this email.
Marco Barbàra.

P.S.: I work on a Debian system, but this shouldn't matter.



More information about the R-help mailing list