[Bioc-devel] rfc - rowttests in genefilter package

Steven McKinney smckinney at bccrc.ca
Thu Jul 16 21:31:01 CEST 2009


Hi Wolfgang, 

Two issues:

1) The style of the formula you show below is known
to have numerical accuracy problems.

Why not just use the R var() function
on the data that you used to calculate
ss and s with? (or the C code that 
implements it?)  I believe this issue
has been adequately handled there, though
I haven't read the source code.

2) Is that the correct formula?
An unbiased variance calculation would be

(ss - s * s / n)/(n-1)
 

 

Steven McKinney, Ph.D.

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney at bccrc.ca
tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3

Canada


 

 


> -----Original Message-----
> From: bioc-devel-bounces at stat.math.ethz.ch [mailto:bioc-devel-
> bounces at stat.math.ethz.ch] On Behalf Of Wolfgang Huber
> Sent: Thursday, July 16, 2009 3:10 AM
> To: Bioconductor Developers
> Subject: [Bioc-devel] rfc - rowttests in genefilter package
> 
> Hi,
> 
> I noted in this function (which I wrote) that if the number of samples
> in each group is large (more than, say, 1000), floating point errors
> become significant, to the point of invalidating the results.
> Essentially, the reason is that I compute the within group variances
> via
> 
>     ss - s * s / n
> 
> where ss is the sum of squared values, s is the sum of values, and n
> the sample size [1].
> 
> I've added a warning to the man page asking people only to use the
> function when the number of samples is dozens to a few hundred. I can
> think of a few obvious ways to make the code less vulnerable to the
> finite precision of floating point arithmetic, but I am sure this
> problem has been solved many times before and would like to ask for
> pointers or suggestions.
> 
> Best wishes
>       Wolfgang
> 
> [1]
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/genefilter/
> src/rowttests.c
> 
> -------------------------------------------------------
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list