[R] Boundaries of R

Mike Marchywka marchywka at hotmail.com
Fri Feb 18 14:56:51 CET 2011













----------------------------------------
> Date: Fri, 18 Feb 2011 08:39:05 -0500
> From: murdoch.duncan at gmail.com
> To: michael at aers.ca
> CC: r-help at r-project.org
> Subject: Re: [R] Boundaries of R
>
> On 18/02/2011 5:44 AM, Michael Holt wrote:
> > Hello Everyone,
> >
> > I'm pretty new to R and I'm trying to get some idea of the capabilities of
> > the language. I work with some pretty large data sets and the consensus
> > seems to be that R does not work well with big data. I've started talking to
> > the guys at Revolution, but I need to get some outside opinions of what R
> > can actually handle. At about what size does R start to run into problems?
> >
>
> Vectors are limited to about 2 billion entries (2^31 - 1). Matrices are
> vectors, so that limit applies to the total count of entries.
> Dataframes are lists of vectors, so that limit applies separately to the
> numbers of rows and columns.
>
> Simple R code keeps everything in memory, so you're likely to run into
> hardware limits if you start working with really big vectors. There are
> a number of packages that alleviate that by paging data in and out, but
> it takes a bit of work on your part to use them. As far as I know,


Do you have more details here? I anticipate working on large data sets
at some point and maybe writing my own packages. I was considering just
writing the things to take file names as source an destination but this
seems a bit restrictive. Also, the "paging" is presumably an issue with local
resources and the algorithm(s). In theory anyway, there could be an interface
that lets the algorithm tell the data struct how it will be accessed. I guess
you could make a factory that builds the right struct if you pass it size info
and some indication or what you hope to do with it but if you need to prefetch
something that should be transparent to user if algorhitms support it. 



> Revolution offers nothing in this area that isn't on CRAN, but they can
> certainly give you advice.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list