[Rd] [R] computing the variance
Martin Maechler
maechler at stat.math.ethz.ch
Mon Dec 5 19:10:08 CET 2005
{from R-help, diverted to R-devel}:
UweL> Wang Tian Hua wrote:
UweL> hi, when i was computing the variance of a simple
UweL> vector, i found unexpect result. not sure whether it
UweL> is a bug.
UweL> Not a bug! ?var:
UweL> "The denominator n - 1 is used which gives an unbiased
UweL> estimator of the (co)variance for
UweL> i.i.d. observations."
UweL> > var(c(1,2,3))
UweL> [1] 1 #which should be 2/3.
UweL> > var(c(1,2,3,4,5))
UweL> [1] 2.5 #which should be 10/5=2
UweL>
UweL> it seems to me that the program uses (sample size -1) instead of sample
UweL> size at the denominator. how can i rectify this?
UweL> Simply change it by:
UweL> x <- c(1,2,3,4,5)
UweL> n <- length(x)
UweL> var(x)*(n-1)/n
UweL> if you really want it.
It seems Insightful at some point in time have given in to
this user request, and S-plus nowadays has
an argument "unbiased = TRUE"
where the user can choose {to shoot (him/her)self in the leg and}
require 'unbiased = FALSE'.
{and there's also 'SumSquraes = FALSE' which allows to not
require any division (by N or N-1)}
Since in some ``schools of statistics'' people are really still
taught to use a 1/N variance, we could envisage to provide such an
argument to var() {and cov()} as well. Otherwise, people define
their own variance function such as
VAR <- function(x,....) .. N/(N-1)*var(x,...)
Should we?
BTW: S+ even has the 'unbiased' argument for cor() where of course it
really doesn't make any difference (!), and actually I think is
rather misleading, since the sample correlation is not unbiased
in almost all cases AFAICS.
Martin
More information about the R-devel
mailing list