[Bioc-devel] rfc - rowttests in genefilter package
paboyoun at fhcrc.org
Thu Jul 16 21:45:27 CEST 2009
There are robust one-pass algorithms for calculating variances, if that
is what you are interested in. Wikipedia has a nice summary of
algorithms for calculating variance. Here is the link to the robust
Steven McKinney wrote:
> Hi Wolfgang,
> Two issues:
> 1) The style of the formula you show below is known
> to have numerical accuracy problems.
> Why not just use the R var() function
> on the data that you used to calculate
> ss and s with? (or the C code that
> implements it?) I believe this issue
> has been adequately handled there, though
> I haven't read the source code.
> 2) Is that the correct formula?
> An unbiased variance calculation would be
> (ss - s * s / n)/(n-1)
> Steven McKinney, Ph.D.
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
> email: smckinney at bccrc.ca
> tel: 604-675-8000 x7561
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
>> -----Original Message-----
>> From: bioc-devel-bounces at stat.math.ethz.ch [mailto:bioc-devel-
>> bounces at stat.math.ethz.ch] On Behalf Of Wolfgang Huber
>> Sent: Thursday, July 16, 2009 3:10 AM
>> To: Bioconductor Developers
>> Subject: [Bioc-devel] rfc - rowttests in genefilter package
>> I noted in this function (which I wrote) that if the number of samples
>> in each group is large (more than, say, 1000), floating point errors
>> become significant, to the point of invalidating the results.
>> Essentially, the reason is that I compute the within group variances
>> ss - s * s / n
>> where ss is the sum of squared values, s is the sum of values, and n
>> the sample size .
>> I've added a warning to the man page asking people only to use the
>> function when the number of samples is dozens to a few hundred. I can
>> think of a few obvious ways to make the code less vulnerable to the
>> finite precision of floating point arithmetic, but I am sure this
>> problem has been solved many times before and would like to ask for
>> pointers or suggestions.
>> Best wishes
>> Wolfgang Huber
>> Bioc-devel at stat.math.ethz.ch mailing list
> Bioc-devel at stat.math.ethz.ch mailing list
More information about the Bioc-devel