[R] Alternatives to merge for large data sets?
Adam D. I. Kramer
adik at ilovebacon.org
Thu Sep 7 08:12:52 CEST 2006
Hello,
I am trying to merge two very large data sets, via
pubbounds.prof <-
merge(x=pubbounds,y=prof,by.x="user",by.y="userid",all=TRUE,sort=FALSE)
which gives me an error of
Error: cannot allocate vector of size 2962 Kb
I am reasonably sure that this is correct syntax.
The trouble is that pubbounds and prof are large; they are data frames which
take up 70M and 11M respectively when saved as .Rdata files.
I understand from various archive searches that "merge can't handle that,"
because merge takes n^2 memory, which I do not have.
My question is whether there is an alternative to merge which would carry
out the process in a slower, iterative manner...or if I should just bite the
bullet, write.table, and use a perl script to do the job.
Thankful as always,
Adam D. I. Kramer
More information about the R-help
mailing list