[R] [OT] Advice for medium size data management

MacQueen, Don macqueen1 at llnl.gov
Fri Mar 7 17:37:58 CET 2014


With those kinds of numbers, I would think a database would be appropriate
(instead of spreadsheets).

You can begin to assess performance of R with 90,000 observations with
experiments like this:

mydat <- list()
for (i in 1:30) mydat[[i]] <- sample(letters, size=90000, replace=TRUE)
mydat2 <- as.data.frame(mydat, stringsAsFactors=FALSE)
dim(mydat2)[1] 90000    30

lapply(mydat2, table)

-Don


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 3/7/14 7:46 AM, "Marco Barbàra" <jabbba at gmail.com> wrote:

>Dear UseRs,
>
>I am going to be involved in the analysis of a cohort of about 90,000
>people. I still didn't have the data at hand, but I know that right now
>they are archived into spreadsheet files. So far I only analysed data
>sets of very small size. I probably will be able to work on a
>relatively fast pc, an i7 with 8 or (i hope) 16 GB RAM. I don't know
>the number of variables but I think I shouldn't have the need to use
>other than "standard" R  (i.e. holding the entire data frame in RAM)
>evev if I probably will have to use some non-parametric tools which
>should be a bit more computer-intensive.
>
>Still, since I have no previous experience, it'd be of great help if
>someone could give me some advice on which ways could be most
>convenient to work in, both from the point of you of databases and of
>data access, or otherwise if there is simply no reason for me to bother
>at all.
>
>I'm not asking for prepackaged solutions, rather for help in
>documentation seeking and links to useful documentation or other
>threads (for example: is it worthwhile using parallel computing?)
>
>Thank you to anyone for reading this email.
>Marco Barbàra.
>
>P.S.: I work on a Debian system, but this shouldn't matter.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list