[Rd] [R] computing the variance
Duncan Murdoch
murdoch at stats.uwo.ca
Mon Dec 5 20:37:50 CET 2005
On 12/5/2005 2:25 PM, (Ted Harding) wrote:
> On 05-Dec-05 Martin Maechler wrote:
>> UweL> x <- c(1,2,3,4,5)
>> UweL> n <- length(x)
>> UweL> var(x)*(n-1)/n
>>
>> UweL> if you really want it.
>>
>> It seems Insightful at some point in time have given in to
>> this user request, and S-plus nowadays has
>> an argument "unbiased = TRUE"
>> where the user can choose {to shoot (him/her)self in the leg and}
>> require 'unbiased = FALSE'.
>> {and there's also 'SumSquraes = FALSE' which allows to not
>> require any division (by N or N-1)}
>>
>> Since in some ``schools of statistics'' people are really still
>> taught to use a 1/N variance, we could envisage to provide such an
>> argument to var() {and cov()} as well. Otherwise, people define
>> their own variance function such as
>> VAR <- function(x,....) .. N/(N-1)*var(x,...)
>> Should we?
>
> If people need to do this, such an option would be a convenience,
> but I don't see that it has much further merit than that.
>
> My view of how to calculate a "variance" is based, not directly
> on the the "unbiased" issue, but on the following.
>
> Suppose you define a RV X as a single value sampled from a finite
> population of values X1,...,XN.
>
> The variance of X is (or damn well should be) defined as
>
> Var(X) = E(X^2) - (E(X))^2
>
> and this comes to (Sum(X^2) - (Sum(X)/N)^2))/(N-1).
I don't follow this. I agree with the first line (though I prefer to
write it differently), but I don't see how it leads to the second. For
example, consider a distribution which is equally likely to be +/- 1,
and a sample from it consisting of a single 1 and a single -1. The
first formula gives 1 (which is the variance), the second gives 2.
The second formula is unbiased because in a random sample I am just as
likely to get a 0 from the second formula, but I'm curious about what
you mean by "this comes to".
Duncan
More information about the R-devel
mailing list