[R] Ranjan Maitra's comment "Re: Variance is different in R vs. Excel?

Ronald Wyllys wyllys at ischool.utexas.edu
Wed Feb 11 01:09:22 CET 2015


FWIW, both Excel 2007 and LibreOffice 4.2 yield the correct variance for 
the numbers in Ranjan Maitra's HW problem for incoming students in R.  
Namely, both these programs yield a sample variance of 0.2777777778 
(rounded to 10 decimal digits).

Ronald Wyllys


On 02/10/2015 05:00 AM, r-help-request at r-project.org wrote:
> ate: Mon, 9 Feb 2015 17:39:14 -0600
> From: Ranjan Maitra<maitra.mbox.ignored at inbox.com>
> To:<r-help at stat.math.ethz.ch>
> Subject: Re: [R] Variance is different in R vs. Excel?
> Message-ID:<20150209173914.bae4d99ebeadafed3515372c at inbox.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I suspect that this is the long-documented issue with indeed an entire industry -- and publications -- devoted to finding such errors in Excel. Till the 2013 version, it used to be a favorite HW problem of mine. Basically, Excel uses the "short formula" to calculate the variance and the sd. This "short formula" has numerical issues with larger numbers (though I am surprised at the OP's data because these numbers were not that large). Anyway, the "long formula" which removes the mean from each datapoint, squares and sums is preferred with large numbers.
>
> Btw, my HW problem for incoming students in my R class would be this:
>
> Consider the following numbers:
> 100000000000001, 100000000000002, 100000000000001, 100000000000002, 100000000000001,
> 100000000000002, 100000000000001, 100000000000002, 100000000000001, 100000000000002.
>
> Calculate the variance in Excel (gives pure garbage) and in R.
>
> I got this (or may have adapted it) from the book: Numerical Issues in Statistical Computing for the Social Scientist by M. Altman, J. Gill and M. P. McDonald.
>
> After over 10 years, Excel finally appears to have fixed the issue. gnumeric never had this problem.
>
> Best wishes,
> Ranjan



More information about the R-help mailing list