[R] Ranjan Maitra's comment "Re: Variance is different in R vs. Excel?
Ronald Wyllys
wyllys at ischool.utexas.edu
Wed Feb 11 01:09:22 CET 2015
FWIW, both Excel 2007 and LibreOffice 4.2 yield the correct variance for
the numbers in Ranjan Maitra's HW problem for incoming students in R.
Namely, both these programs yield a sample variance of 0.2777777778
(rounded to 10 decimal digits).
Ronald Wyllys
On 02/10/2015 05:00 AM, r-help-request at r-project.org wrote:
> ate: Mon, 9 Feb 2015 17:39:14 -0600
> From: Ranjan Maitra<maitra.mbox.ignored at inbox.com>
> To:<r-help at stat.math.ethz.ch>
> Subject: Re: [R] Variance is different in R vs. Excel?
> Message-ID:<20150209173914.bae4d99ebeadafed3515372c at inbox.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I suspect that this is the long-documented issue with indeed an entire industry -- and publications -- devoted to finding such errors in Excel. Till the 2013 version, it used to be a favorite HW problem of mine. Basically, Excel uses the "short formula" to calculate the variance and the sd. This "short formula" has numerical issues with larger numbers (though I am surprised at the OP's data because these numbers were not that large). Anyway, the "long formula" which removes the mean from each datapoint, squares and sums is preferred with large numbers.
>
> Btw, my HW problem for incoming students in my R class would be this:
>
> Consider the following numbers:
> 100000000000001, 100000000000002, 100000000000001, 100000000000002, 100000000000001,
> 100000000000002, 100000000000001, 100000000000002, 100000000000001, 100000000000002.
>
> Calculate the variance in Excel (gives pure garbage) and in R.
>
> I got this (or may have adapted it) from the book: Numerical Issues in Statistical Computing for the Social Scientist by M. Altman, J. Gill and M. P. McDonald.
>
> After over 10 years, Excel finally appears to have fixed the issue. gnumeric never had this problem.
>
> Best wishes,
> Ranjan
More information about the R-help
mailing list