[Bioc-devel] rfc - rowttests in genefilter package

Wolfgang Huber whuber at embl.de
Thu Jul 16 12:09:42 CEST 2009


Hi,

I noted in this function (which I wrote) that if the number of samples 
in each group is large (more than, say, 1000), floating point errors 
become significant, to the point of invalidating the results. 
Essentially, the reason is that I compute the within group variances via

    ss - s * s / n

where ss is the sum of squared values, s is the sum of values, and n the 
sample size [1].

I've added a warning to the man page asking people only to use the 
function when the number of samples is dozens to a few hundred. I can 
think of a few obvious ways to make the code less vulnerable to the 
finite precision of floating point arithmetic, but I am sure this 
problem has been solved many times before and would like to ask for 
pointers or suggestions.

Best wishes
      Wolfgang

[1] 
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/genefilter/src/rowttests.c

-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioc-devel mailing list