[R] cbind alternate

Marc Schwartz marc_schwartz at me.com
Fri Jan 6 19:39:29 CET 2012


On Jan 6, 2012, at 11:43 AM, Mary Kindall wrote:

> I have two one dimensional list of elements and want to perform cbind and
> then write into a file. The number of entries are more than a million in
> both lists. R is taking a lot of time performing this operation.
> 
> Is there any alternate way to perform cbind?
> 
> x = table1[1:1000000,1]
> y = table2[1:1000000,5]
> 
> z = cbind(x,y)   //hanging the machine
> 
> write.table(z,'out.txt)



The issue is not the use of cbind(), but that write.table() can be slow with data frames, where each column may be a different class (data type) and requires separate formatting for output. This is referenced in the Note section of ?write.table:

write.table can be slow for data frames with large numbers (hundreds or more) of columns: this is inevitable as each column could be of a different class and so must be handled separately. If they are all of the same class, consider using a matrix instead.


I suspect in this case, while you don't have a large number of columns, you do have a large number of rows, so that there is a tradeoff.

If all of the columns in your source tables are of the same type (eg. all numeric), coerce 'z' to a matrix and then try using write.table().

z <- matrix(rnorm(1000000 * 6), ncol = 6)

> str(z)
 num [1:1000000, 1:6] -0.713 0.79 -0.538 0.945 1.621 ...

> system.time(write.table(z, file = "test.txt"))
   user  system elapsed 
 12.664   0.292  13.029 


The resultant file is about 118 Mb on my system.

HTH,

Marc Schwartz



More information about the R-help mailing list