[R-sig-hpc] using rbind to combine RODBC result sets using ffdf

Jens Oehlschlägel jens.oehlschlaegel at truecluster.com
Tue Sep 21 17:46:26 CEST 2010


Hi Steve,

>I have a fairly large (~4M rows, 27 double columns) data set I am attempting to 
>read in in R on Windows using RODBC and find I am running out of memory.

Fairly small under Rwin64 with enough RAM, and reasonably small for ff under RWin32.

>I thought I could easily workaround the problem by reading in smaller number of 
>rows and then rbind the results together as below:-

Fair approach, will work. So far we don't have an rbind for ffdf (but all pieces are there, see below). If we had rbind.ffdf it ideally would cover the following situations

1) combing two (or more) ffdf into a third one
2) growing an existing ffdf by another (or more) ffdf
3) growing an existing ffdf by another (or more) data.frame

Technicaly all the pieces are there. You can 
first.ffdf <- as.ffdf(first.data.frame)
third.ffdf <- clone(first.ffdf)
and grow an ffdf by assigning nrow like in  
nrow(third.ffdf) <- nrow(third.ffdf) + nrow(second.ffdf)

Then the devil is in some details like maintaining appropriate factor levels. Example for how this works is in read.table.ffdf. Should be easy to abstract a rbind.ffdf from this, think I will do this soon - but first have to prepare the next release of ff which will support RWin64, sorting, ordering and using positive ff integers as subscripts. If you abstract rbind.fdf from read.table.ffdf before I do, please let me know, then I am happy to include in ff and give credit.

I recently had a similar request from Xiaobo Gu who uses a different data.base driver and who wrote code which covers some of the functionality in read.table.ffdf for his database connection. Maybe he is willing to share the code with you, I cc him.

[r-sig-hpc at r-project.org] is the appropriate list for questions on ff, I cc.

Cheers
Jens



More information about the R-sig-hpc mailing list