[R] cbind alternate
Marc Schwartz
marc_schwartz at me.com
Fri Jan 6 19:58:09 CET 2012
On Jan 6, 2012, at 12:39 PM, Marc Schwartz wrote:
> On Jan 6, 2012, at 11:43 AM, Mary Kindall wrote:
>
>> I have two one dimensional list of elements and want to perform cbind and
>> then write into a file. The number of entries are more than a million in
>> both lists. R is taking a lot of time performing this operation.
>>
>> Is there any alternate way to perform cbind?
>>
>> x = table1[1:1000000,1]
>> y = table2[1:1000000,5]
>>
>> z = cbind(x,y) //hanging the machine
>>
>> write.table(z,'out.txt)
>
Apologies, I mis-read where the hang up was. It is in the use of cbind() prior to calling write.table(), not in write.table() itself.
Not sure why that part is taking a long time, unless as already mentioned, you are short on memory available. This runs quickly for me:
x <- matrix(rnorm(1000000 * 3), ncol = 3)
y <- matrix(rnorm(1000000 * 3), ncol = 3)
> system.time(z <- cbind(x, y))
user system elapsed
0.039 0.025 0.065
> str(z)
num [1:1000000, 1:6] -0.5102 1.8776 2.4635 0.2982 0.0901 ...
To give an example with two data frames containing differing data types, let's use the built-in 'iris' data set, which has 5 columns and 150 rows by default. Let's create a new version with over a million rows:
iris.new <- iris[rep(seq(nrow(iris)), 7000), ]
> str(iris.new)
'data.frame': 1050000 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> system.time(iris.new2 <- cbind(iris.new, iris.new))
user system elapsed
5.289 0.282 5.658
> str(iris.new2)
'data.frame': 1050000 obs. of 10 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
You might verify the structures of your 'x' and 'y' to be sure that there is not something amiss with either one.
HTH,
Marc Schwartz
More information about the R-help
mailing list