[R] Memory issues in R

Avram Aelony aavram at mac.com
Tue Apr 28 05:36:53 CEST 2009


Others may have mentioned this, but you might try loading your data  
in a small database like mysql and then pulling smaller portions of  
your data in via a package like RMySQL or RODBC.

One approach might be to split the data file into smaller pieces  
outside of R, then read the smaller pieces into R one at a time,  
subsequently creating aggregations (counts and sums of your data  
fields).  From these aggregations you can create an "aggregated"  
dataset that is smaller and more pithy that you ultimately may graph  
with ggplot2 or other libraries of your choice.

-Avram



On Apr 26, 2009, at 8:20 AM, Neotropical bat risk assessments wrote:

>
>    How do people deal with R and memory issues?
>    I have tried using gc() to see how much memory is used at each  
> step.
>    Scanned Crawley R-Book and all other R books I have available  
> and the FAQ
>    on-line but no help really found.
>    Running WinXP Pro (32 bit) with 4 GB RAM.
>    One SATA drive pair is in RAID 0 configuration with 10000 MB  
> allocated as
>    virtual memory.
>    I do have another machine set up with Ubuntu but it only has 2  
> GB RAM and
>    have not been able to get R installed on that system.
>    I can run smaller sample data sets w/o problems and everything  
> plots as
>    needed.
>    However I need to review large data sets.
>    Using latest R version 2.9.0 (2009-04-17)
>    My  data is in CSV format with a header row and is a big data  
> set with
>    1,200,240 rows!
>    E.g. below:
>    Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc,
>    9.81,0,28.78,24.54,26.49,25.81,48.84,14.78,
>    4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83,
>    4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44,
>    10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05,
>    15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64,
>    14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32,
>    11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62,
>    14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06,
>    8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2,
>    3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83,
>    16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11,
>    16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95,
>    16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38,
>    5.12,107.16,32,29.41,30.46,29.41,134.45,20.88,
>    16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81
>    .... etc
>    I am getting the following warning/error message:
>    Error: cannot allocate vector of size 228.9 Mb
>    Complete listing from R console below:
>> library(batcalls)
>    Loading required package: ggplot2
>    Loading required package: proto
>    Loading required package: grid
>    Loading required package: reshape
>    Loading required package: plyr
>    Attaching package: 'ggplot2'
>            The following object(s) are masked from package:grid :
>             nullGrob
>> gc()
>             used (Mb) gc trigger (Mb) max used (Mb)
>    Ncells 186251  5.0     407500 10.9   350000  9.4
>    Vcells  98245  0.8     786432  6.0   358194  2.8
>> BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv")
>> gc()
>              used (Mb) gc trigger  (Mb) max used  (Mb)
>    Ncells  188034  5.1     667722  17.9   378266  10.2
>    Vcells 9733249 74.3   20547202 156.8 20535538 156.7
>> attach(BR)
>> library(ggplot2)
>> library(MASS)
>> library(batcalls)
>> BRC<-kde2d(Sc,Fc)
>    Error: cannot allocate vector of size 228.9 Mb
>> gc()
>               used  (Mb) gc trigger  (Mb)  max used  (Mb)
>    Ncells   198547   5.4     667722  17.9    378266  10.2
>    Vcells 19339695 147.6  106768803 814.6 124960863 953.4
>>
>    Tnx for any insight,
>    Bruce
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list