[BioC] working with large dataframes in R
Robert Baer
rbaer at atsu.edu
Thu May 26 00:43:24 CEST 2011
I ran the following code which would seem to simulate your data with no
problem on a compute with only 2 Gb memory. Further, this is not a
particularly large dataframe.
X.val = sample(0:10, 2*55840/2, replace = T)
Y.val = sample(0:10, 2*55840/2, replace = T)
time.value = 1:55840
graph.type = c(rep('D1vD2', 27920), rep('U1vU2',27920))
df = data.frame(X.val, Y.val, time.value, graph.type)
library(ggplot2)
qplot(X.val,Y.val, data= df, colour=graph.type)
My guess is that either you have a large amount of you memory used for
something else or you should try again after restarting everything.
Rob
------------------------------------------
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965
--------------------------------------------------
From: "Elena Sorokin" <sorokin at wisc.edu>
Sent: Wednesday, May 25, 2011 3:09 PM
To: <bioconductor at r-project.org>
Subject: [BioC] working with large dataframes in R
> Hello, I was recommended to seek out help from this forum. When working
> with large tables of count data (or any other type of data, for that
> matter), R runs out of RAM. Specifically, I'm trying to visualize a large
> data set consisting of count data (55,840 rows by 4 columns) using the
> graphical package ggplot2, and when I try to make a complex scatterplot, I
> get an error message. I've pasted an example code below, along with some
> description of what the data frame is. Any advice about how to store this
> data.frame object in a less memory-intensive way would be greatly
> appreciated. Should I just increase my memory-limit? Alternatively, I
> don't know anything about SQL and relational databases, but am willing to
> learn, if this is really the key to working with large objects in R.
> Sincerely, Elena
>
> > library(ggplot2)
> # I already loaded my data into a data frame object using read.delim
> > summary(df)
> X.val Y.val time.value graph.type
> 0 :20642 0 :20737 1:55840 D1vD2:27920
> 1 : 2139 1 : 2310 U1vU2:27920
> 2 : 1162 2 : 1150
> 3 : 774 3 : 797
> 4 : 607 4 : 572
> 5 : 535 5 : 513
> (Other):29981 (Other):29761
> > class(df)
> [1] "data.frame"
> > dim(df)
> [1] 55840 4
> > qplot(X.val,Y.val, data= df, colour=graph.type)
> Error: cannot allocate vector of size 119.2 Mb
> In addition: Warning messages:
> 1: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
> Reached total allocation of 1535Mb: see help(memory.size)
> 2: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
> Reached total allocation of 1535Mb: see help(memory.size)
> 3: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
> Reached total allocation of 1535Mb: see help(memory.size)
> 4: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
> Reached total allocation of 1535Mb: see help(memory.size)
>
> > sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list