[R] post

Alexey Ush ushan26 at yahoo.com
Mon Sep 13 22:26:39 CEST 2010


Hello,

I have a question regarding how to speed up the t.test on large dataset. For example, I have a table "tab" which looks like:

	a	b	c	d	e 	f	g	h....
1	
2
3
4
5

...

100000

dim(tab) is 100000 x 100



I need to do the t.test for each row on the two subsets of columns, ie to compare a b d group against e f g group at each row.


subset 1:					
	a	b	d
1	
2
3
4
5

...

100000


subset 2:
	e	f	g
1	
2
3
4
5

...

100000

    100000 t.test's for each row for these two subsets will take around 1 min. The prblem is that I have around 10000 different combinations of such a subsets. therefore 1min*10000
=10000min in the case if I will use "for" loop like this:

n1=10000 #number of subset combinations
for (i1 in 1:n1) {

n2=100000 # number of rows
i2=1
for (i2 in 1:n1) {
	t.test(tab[i2,v5],tab[i2,v6])$p.value  #v5 and v6 are vectors containing the veriable names for the two subsets (they are different for each loop)
	}

}


My question is there more efficient way how to do this computations in a short period of time? Any packages, like plyr? May be direct calculations isted of using t.test function?


Thank you. 





More information about the R-help mailing list