[BioC] working with large dataframes in R

Robert Baer rbaer at atsu.edu
Thu May 26 00:43:24 CEST 2011


I ran the following code which would seem to simulate your data with no 
problem on a compute with only 2 Gb memory.  Further, this is not a 
particularly large dataframe.

X.val = sample(0:10, 2*55840/2, replace = T)
Y.val = sample(0:10, 2*55840/2, replace = T)
time.value = 1:55840
graph.type = c(rep('D1vD2', 27920), rep('U1vU2',27920))
df = data.frame(X.val, Y.val, time.value, graph.type)

library(ggplot2)
qplot(X.val,Y.val, data= df, colour=graph.type)

My guess is that either you have a large amount of you memory used for 
something else or you should try again after restarting everything.

Rob
------------------------------------------
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965


--------------------------------------------------
From: "Elena Sorokin" <sorokin at wisc.edu>
Sent: Wednesday, May 25, 2011 3:09 PM
To: <bioconductor at r-project.org>
Subject: [BioC] working with large dataframes in R

> Hello, I was recommended to seek out help from this forum. When working 
> with large tables of count data (or any other type of data, for that 
> matter), R runs out of RAM. Specifically, I'm trying to visualize a large 
> data set consisting of count data (55,840 rows by 4 columns) using the 
> graphical package ggplot2, and when I try to make a complex scatterplot, I 
> get an error message. I've pasted an example code below, along with some 
> description of what the data frame is. Any advice about how to store this 
> data.frame object in a less memory-intensive way would be greatly 
> appreciated. Should I just increase my memory-limit? Alternatively, I 
> don't know anything about SQL and relational databases, but am willing to 
> learn, if this is really the key to working with large objects in R. 
> Sincerely, Elena
>
> > library(ggplot2)
> # I already loaded my data into a data frame object using read.delim
> > summary(df)
>      X.val           Y.val       time.value graph.type
>  0      :20642   0      :20737   1:55840    D1vD2:27920
>  1      : 2139   1      : 2310              U1vU2:27920
>  2      : 1162   2      : 1150
>  3      :  774   3      :  797
>  4      :  607   4      :  572
>  5      :  535   5      :  513
>  (Other):29981   (Other):29761
> > class(df)
> [1] "data.frame"
> > dim(df)
> [1] 55840     4
> > qplot(X.val,Y.val, data= df, colour=graph.type)
> Error: cannot allocate vector of size 119.2 Mb
> In addition: Warning messages:
> 1: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 2: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 3: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
>   Reached total allocation of 1535Mb: see help(memory.size)
> 4: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
>   Reached total allocation of 1535Mb: see help(memory.size)
>
> > sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United 
> States.1252
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods 
> base
>
> other attached packages:
> [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list