[R] How to extract x rows to get x pvalues using t.test
Thomas Lumley
tlumley at u.washington.edu
Wed Mar 16 16:16:49 CET 2005
On Tue, 15 Mar 2005, Liaw, Andy wrote:
>> From: Adaikalavan Ramasamy
>>
>> You will need to _apply_ the t-test row by row.
>>
>> apply( genes, 1, function(x) t.test( x[1:2], x[3:4] )$p.value )
>>
>> apply() is a C optimised version of for. Running the above code on a
>> dataset with 56000 rows and 4 columns took about 63 seconds on my 1.6
>> GHz Pentium machine with 512 Mb RAM. See help("apply") for
>> more details.
>
> That's not true. In R, there's a for loop hidden inside apply() (just look
> at the source). In S-PLUS, C level looping is done in some situations, and
> for others lapply() is used.
>
It's slightly more complicated than this. lapply() really is a C-level
loop and apply() eventually calls it.
Now, whatever happends inside apply(), it still true that t.test() has to
be called 56,000 times, providing a lower bound on the time apply() can
take. In this case I would be very surprised if apply() saved any time.
What would save time is writing a stripped-down t-test function,
especially as only the p-value is being used.
The real problem with apply is that when the objects involved are large,
apply() can be substantially slower because of greater memory use. As a
concrete example, an apply() on a 10000x757 set of replicate weights in
the survey package used half as much memory when turned into a for() loop.
As a result it ran several times faster on my laptop (where it was paging
heavily) and slightly faster on my desktop (which has rather more memory).
-thomas
More information about the R-help
mailing list