[R] cbind alternate
Marc Schwartz
marc_schwartz at me.com
Fri Jan 6 19:39:29 CET 2012
On Jan 6, 2012, at 11:43 AM, Mary Kindall wrote:
> I have two one dimensional list of elements and want to perform cbind and
> then write into a file. The number of entries are more than a million in
> both lists. R is taking a lot of time performing this operation.
>
> Is there any alternate way to perform cbind?
>
> x = table1[1:1000000,1]
> y = table2[1:1000000,5]
>
> z = cbind(x,y) //hanging the machine
>
> write.table(z,'out.txt)
The issue is not the use of cbind(), but that write.table() can be slow with data frames, where each column may be a different class (data type) and requires separate formatting for output. This is referenced in the Note section of ?write.table:
write.table can be slow for data frames with large numbers (hundreds or more) of columns: this is inevitable as each column could be of a different class and so must be handled separately. If they are all of the same class, consider using a matrix instead.
I suspect in this case, while you don't have a large number of columns, you do have a large number of rows, so that there is a tradeoff.
If all of the columns in your source tables are of the same type (eg. all numeric), coerce 'z' to a matrix and then try using write.table().
z <- matrix(rnorm(1000000 * 6), ncol = 6)
> str(z)
num [1:1000000, 1:6] -0.713 0.79 -0.538 0.945 1.621 ...
> system.time(write.table(z, file = "test.txt"))
user system elapsed
12.664 0.292 13.029
The resultant file is about 118 Mb on my system.
HTH,
Marc Schwartz
More information about the R-help
mailing list