[BioC] working with large dataframes in R
Elena Sorokin
sorokin at wisc.edu
Wed May 25 22:09:21 CEST 2011
Hello, I was recommended to seek out help from this forum. When working
with large tables of count data (or any other type of data, for that
matter), R runs out of RAM. Specifically, I'm trying to visualize a
large data set consisting of count data (55,840 rows by 4 columns) using
the graphical package ggplot2, and when I try to make a complex
scatterplot, I get an error message. I've pasted an example code below,
along with some description of what the data frame is. Any advice about
how to store this data.frame object in a less memory-intensive way would
be greatly appreciated. Should I just increase my memory-limit?
Alternatively, I don't know anything about SQL and relational databases,
but am willing to learn, if this is really the key to working with large
objects in R. Sincerely, Elena
> library(ggplot2)
# I already loaded my data into a data frame object using read.delim
> summary(df)
X.val Y.val time.value graph.type
0 :20642 0 :20737 1:55840 D1vD2:27920
1 : 2139 1 : 2310 U1vU2:27920
2 : 1162 2 : 1150
3 : 774 3 : 797
4 : 607 4 : 572
5 : 535 5 : 513
(Other):29981 (Other):29761
> class(df)
[1] "data.frame"
> dim(df)
[1] 55840 4
> qplot(X.val,Y.val, data= df, colour=graph.type)
Error: cannot allocate vector of size 119.2 Mb
In addition: Warning messages:
1: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
Reached total allocation of 1535Mb: see help(memory.size)
2: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
Reached total allocation of 1535Mb: see help(memory.size)
3: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
Reached total allocation of 1535Mb: see help(memory.size)
4: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
Reached total allocation of 1535Mb: see help(memory.size)
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United
States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods base
other attached packages:
[1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2
loaded via a namespace (and not attached):
[1] tools_2.13.0
More information about the Bioconductor
mailing list