[R] error loading huge .RData

Luke Tierney luke at stat.umn.edu
Wed Apr 24 15:59:01 CEST 2002


On Wed, Apr 24, 2002 at 02:39:15PM +0200, Peter Dalgaard BSA wrote:
> "Liaw, Andy" <andy_liaw at merck.com> writes:
> 
> > Patrick,
> > 
> > I appreciate your comments, and practice everything that you preach.
> > However, that workspace image contains only 2~3 R objects: the input and
> > output of a single R command.  I knew there could be problems, so I've
> > stripped it down to the bare minimum.  Yes, I also kept the commands in a
> > script.  That single command (in case you want to know: a random forest run
> > with 4000 rows and nearly 7000 variables) took over 3 days to run.  There's
> > not a whole lot I can do here when the data is so large.
> 
> Hmm. You could be running into some sort of situation where data
> temporarily take up more space in memory than they need to. It does
> sound like a bit of a bug if R can write images that are bigger than
> it can read. Not sure how to proceed though. Does anyone on R-core
> have a similarly big  system and a spare gigabyte of disk? Is it
> possible to create a mock-up of similarly organized data that displays
> the same effect, but takes less than three days?

I guess we could make sure the write fails as well :-)

Actually that isn't entirely flippant. The serialization mechanism
only preserves sharing that is semantically meaningful (symbols,
environments, external references and weak references).  This has been
so since the first change in the save format in R 0.something.  As a
result, saving and loading a value may result in using more memory for
the restored version.  It would be possible to preserve all sharing
within a single save operation, but that would require keeping track
of all objects as they are written, which requires more memory, and
hence could make the write fail.

It is fairly hard to create values with shared data structure at the R
level (easy in C though) so it hasn't been much of an issue.  One
place where we might be getting bitten though is in the way names are
attached to things; those are often shared when objects are created
but will be duplicated by our save/load strategy. Whether that is an
issue here is hard to tell.

luke

-- 
Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke at stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list