[Rd] [R] computing the variance

Mon Dec 5 20:37:50 CET 2005

On 12/5/2005 2:25 PM, (Ted Harding) wrote:
> On 05-Dec-05 Martin Maechler wrote:
>>     UweL> x <- c(1,2,3,4,5)
>>     UweL> n <- length(x)
>>     UweL> var(x)*(n-1)/n
>> 
>>     UweL> if you really want it.
>> 
>> It seems Insightful at some point in time have given in to
>> this user request, and S-plus nowadays has
>> an argument  "unbiased = TRUE"
>> where the user can choose {to shoot (him/her)self in the leg and}
>> require 'unbiased = FALSE'.
>> {and there's also 'SumSquraes = FALSE' which allows to not
>> require any division (by N or N-1)}
>> 
>> Since in some ``schools of statistics'' people are really still
>> taught to use a 1/N variance, we could envisage to provide such an
>> argument to var() {and cov()} as well.  Otherwise, people define
>> their own variance function such as  
>>       VAR <- function(x,....) .. N/(N-1)*var(x,...)
>> Should we?
> 
> If people need to do this, such an option would be a convenience,
> but I don't see that it has much further merit than that.
> 
> My view of how to calculate a "variance" is based, not directly
> on the the "unbiased" issue, but on the following.
> 
> Suppose you define a RV X as a single value sampled from a finite
> population of values X1,...,XN.
> 
> The variance of X is (or damn well should be) defined as
> 
>   Var(X) = E(X^2) - (E(X))^2
> 
> and this comes to (Sum(X^2) - (Sum(X)/N)^2))/(N-1).

I don't follow this.  I agree with the first line (though I prefer to 
write it differently), but I don't see how it leads to the second.  For 
example, consider a distribution which is equally likely to be +/- 1, 
and a sample from it consisting of a single 1 and a single -1.  The 
first formula gives 1 (which is the variance), the second gives 2.

The second formula is unbiased because in a random sample I am just as 
likely to get a 0 from the second formula, but I'm curious about what 
you mean by "this comes to".

Duncan