[R] cbind alternate

Marc Schwartz marc_schwartz at me.com
Fri Jan 6 19:58:09 CET 2012


On Jan 6, 2012, at 12:39 PM, Marc Schwartz wrote:

> On Jan 6, 2012, at 11:43 AM, Mary Kindall wrote:
> 
>> I have two one dimensional list of elements and want to perform cbind and
>> then write into a file. The number of entries are more than a million in
>> both lists. R is taking a lot of time performing this operation.
>> 
>> Is there any alternate way to perform cbind?
>> 
>> x = table1[1:1000000,1]
>> y = table2[1:1000000,5]
>> 
>> z = cbind(x,y)   //hanging the machine
>> 
>> write.table(z,'out.txt)
> 

Apologies, I mis-read where the hang up was. It is in the use of cbind() prior to calling write.table(), not in write.table() itself.

Not sure why that part is taking a long time, unless as already mentioned, you are short on memory available. This runs quickly for me:

x <- matrix(rnorm(1000000 * 3), ncol = 3)
y <- matrix(rnorm(1000000 * 3), ncol = 3)
 
> system.time(z <- cbind(x, y))
   user  system elapsed 
  0.039   0.025   0.065 

> str(z)
 num [1:1000000, 1:6] -0.5102 1.8776 2.4635 0.2982 0.0901 ...


To give an example with two data frames containing differing data types, let's use the built-in 'iris' data set, which has 5 columns and 150 rows by default. Let's create a new version with over a million rows:

iris.new <- iris[rep(seq(nrow(iris)), 7000), ]

> str(iris.new)
'data.frame':	1050000 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...


> system.time(iris.new2 <- cbind(iris.new, iris.new))
   user  system elapsed 
  5.289   0.282   5.658 


> str(iris.new2)
'data.frame':	1050000 obs. of  10 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...


You might verify the structures of your 'x' and 'y' to be sure that there is not something amiss with either one.

HTH,

Marc Schwartz



More information about the R-help mailing list