[Rd] Vector binding on harddisk

Simon Urbanek simon.urbanek at r-project.org
Thu Feb 14 16:32:08 CET 2008


On Feb 14, 2008, at 6:32 AM, _ wrote:

> Hi all,
> Using big vectors (more than 4GB) is unfortunately not possible under
> Windows or other OS's if not enough RAM exists.
> Could it be possible to implement an a new data type in R, like a
> vector, but instead holding the information in memory, the data lies  
> on
> an file. If data is accessed, the data type vector get the information
> automatically from the file.
> There is a package out there (named ff) but the accessed boundary have
> to be declared by the user this is a disadvantage.
>

I don't think you have been reading the documentation carefully enough  
- it doesn't impose any limits itself. Whatever limits you hit with it  
are due to the OS and/or R, so you cannot write a package that you  
describe without hitting those limits. They are as follows: size of an  
integer in R which limits the length of a single vector (2^31-1 ~ 2G  
entries on 32-bit machines) and file size limit of your OS. The former  
is a really hard limit, the only way to overcome it (without modifying  
R) is to use multiple indices (which the ff package suggests). You can  
overcome the file size limit by simply using multiple files (or using  
a more reasonable OS).

Cheers,
Simon



More information about the R-devel mailing list