[Bioc-devel] rfc - rowttests in genefilter package
Wolfgang Huber
whuber at embl.de
Thu Jul 16 12:09:42 CEST 2009
Hi,
I noted in this function (which I wrote) that if the number of samples
in each group is large (more than, say, 1000), floating point errors
become significant, to the point of invalidating the results.
Essentially, the reason is that I compute the within group variances via
ss - s * s / n
where ss is the sum of squared values, s is the sum of values, and n the
sample size [1].
I've added a warning to the man page asking people only to use the
function when the number of samples is dozens to a few hundred. I can
think of a few obvious ways to make the code less vulnerable to the
finite precision of floating point arithmetic, but I am sure this
problem has been solved many times before and would like to ask for
pointers or suggestions.
Best wishes
Wolfgang
[1]
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/genefilter/src/rowttests.c
-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioc-devel
mailing list