[R] post
Alexey Ush
ushan26 at yahoo.com
Mon Sep 13 22:26:39 CEST 2010
Hello,
I have a question regarding how to speed up the t.test on large dataset. For example, I have a table "tab" which looks like:
a b c d e f g h....
1
2
3
4
5
...
100000
dim(tab) is 100000 x 100
I need to do the t.test for each row on the two subsets of columns, ie to compare a b d group against e f g group at each row.
subset 1:
a b d
1
2
3
4
5
...
100000
subset 2:
e f g
1
2
3
4
5
...
100000
100000 t.test's for each row for these two subsets will take around 1 min. The prblem is that I have around 10000 different combinations of such a subsets. therefore 1min*10000
=10000min in the case if I will use "for" loop like this:
n1=10000 #number of subset combinations
for (i1 in 1:n1) {
n2=100000 # number of rows
i2=1
for (i2 in 1:n1) {
t.test(tab[i2,v5],tab[i2,v6])$p.value #v5 and v6 are vectors containing the veriable names for the two subsets (they are different for each loop)
}
}
My question is there more efficient way how to do this computations in a short period of time? Any packages, like plyr? May be direct calculations isted of using t.test function?
Thank you.
More information about the R-help
mailing list