[R] Computations slow in spite of large amounts of RAM.

Douglas Bates bates at stat.wisc.edu
Tue Jul 1 16:31:13 CEST 2003


"Huiqin Yang" <Huiqin.Yang at noaa.gov> writes:

> Hi all,
> 
> I am a beginner trying to use R to work with large amounts of
> oceanographic data, and I find that computations can be VERY slow.  In
> particular, computational speed seems to depend strongly on the number
> and size of the objects that are loaded (when R starts up).  The same
> computations are significantly faster when all but the essential
> objects are removed.  I am running R on a machine with 16 GB of RAM,
> and our unix system manager assures me that there is memory available
> to my R process that has not been used.
> 
> 1.  Is the problem associated with how R uses memory?  If so, is there
> some way to increase the amount of memory used by my R process to get
> better performance?

You could try setting a large nsize and vsize using 

 mem.limits

See the description in ?Memory

> The computations that are particularly slow involve looping with
> by().  The data are measurements of vertical profiles of pressure,
> temperature, and salinity at a number of stations, which are organized
> into a dataframe p.1 (1925930 rows, 8 columns: id, p, t, and s, etc.),
> and the objective is to get a much smaller dataframe and the unique 
> values for ID is 1409 with the minimum and maximum pressure for each
> profile.  The slow part is:
> 
> h.maxmin <- by(p.1,p.1$id,function(x){
>              data.frame(id=x$id[1],
>                       maxp=max(x$p),
>                       minp=min(x$p))})

I think it would be faster to use

h.maxmin <- tapply(p.1$p, p.1$id, range)

In the call to by you are subsetting the entire data frame and that
probably means taking at least one copy of that frame.  If you use
tapply on only the relevant columns you will use much less space.

> 2.  Even with unneeded data objects removed, this is very slow.  Is
> there a faster way to get the maximum and minimum values?

See above.


-- 
Douglas Bates                            bates at stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/




More information about the R-help mailing list