[R] post

Sat Sep 18 15:31:18 CEST 2010

See if  rowttests is any faster.

library(genefilter)
?rowttests

You have to install Bioconductor. I've used this on large datasets,
but I haven't compared
timings.

On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush <ushan26 at yahoo.com> wrote:
> Hello,
>
> I have a question regarding how to speed up the t.test on large dataset. For example, I have a table "tab" which looks like:
>
>        a       b       c       d       e       f       g       h....
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
> dim(tab) is 100000 x 100
>
>
>
> I need to do the t.test for each row on the two subsets of columns, ie to compare a b d group against e f g group at each row.
>
>
> subset 1:
>        a       b       d
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
>
> subset 2:
>        e       f       g
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 100000
>
>    100000 t.test's for each row for these two subsets will take around 1 min. The prblem is that I have around 10000 different combinations of such a subsets. therefore 1min*10000
> =10000min in the case if I will use "for" loop like this:
>
> n1=10000 #number of subset combinations
> for (i1 in 1:n1) {
>
> n2=100000 # number of rows
> i2=1
> for (i2 in 1:n1) {
>        t.test(tab[i2,v5],tab[i2,v6])$p.value  #v5 and v6 are vectors containing the veriable names for the two subsets (they are different for each loop)
>        }
>
> }
>
>
> My question is there more efficient way how to do this computations in a short period of time? Any packages, like plyr? May be direct calculations isted of using t.test function?
>
>
> Thank you.
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>