# [R] Problem with Weighted Variance in Hmisc

Tom La Bone labone at gforcecable.com
Fri Jun 1 14:29:00 CEST 2007

```Wonderful!  Thanks for the assistance.

Tom La Bone

-----Original Message-----
From: jiho [mailto:jo.irisson at gmail.com]
Sent: Friday, June 01, 2007 8:17 AM
To: labone at gforcecable.com
Cc: 'R-help'
Subject: Re: [R] Problem with Weighted Variance in Hmisc

On 2007-June-01  , at 13:00 , Tom La Bone wrote:
> The equation for weighted variance given in the NIST DataPlot
> documentation
> is the usual variance equation with the weights inserted.  The
> weighted
> variance of the weighted mean is this weighted variance divided by N.
>
> There is another approach to calculating the weighted variance of the
> weighted mean that propagates the uncertainty of each term in the
> weighted
> mean (see Data Reduction and Error Analysis for the Physical
> Sciences by
> Bevington & Robinson).  The two approaches do not give the same
> anyone suggest a reference that discusses the merits of the DataPlot
> approach versus the Bevington approach?

I am no expert but I did a little research on the subject and it
seems there is no analytical equivalent to the standard error of the
mean in weighted statistics (i.e. there is no standard error/variance
of the weighted mean). That's why you find so many formulas: they are
all numerical approximations that make more of less sense. If you
have access to DcienceDirect there is a paper which compares 3 of
those analytical approximations to the variance estimated by bootstrap:

@article{Gatz1995AE,
Author = {Gatz, Donald F and Smith, Luther},
Date-Added = {2007-05-19 14:15:58 +0200},
Date-Modified = {2007-06-01 14:12:17 +0200},
Filed = {Yes},
Journal = {Atmospheric Environment},
Number = {11},
Pages = {1185--1193},
Rating = {0},
Title = {The standard error of a weighted mean concentration--I.
Bootstrapping vs other methods},
Url =
{http://www.sciencedirect.com/science/article/B6VH3-3YGV47C-6X/
2/18b627259a75ff9b765410aaa231e352},
Volume = {29},
Year = {1995},
Abstract = {Concentrations of chemical constituents of precipitation

are frequently expressed in terms of the precipitation-weighted mean,
which has several desirable properties. Unfortunately, the weighted
mean has no analytical analog of the standard error of the arithmetic
mean for use in characterizing its statistical uncertainty. Several
approximate expressions have been used previously in the literature,
but there is no consensus as to which is best. This paper compares
three methods from the literature with a standard based on
bootstrapping. Comparative calculations were carried out for nine
major ions measured at 222 sampling sites in the National Atmospheric
Deposition/National Trends Network (NADP/NTN). The ratio variance
approximation of Cochran (1977) gave results that were not
statistically different from those of bootstrapping, and is suggested
as the method of choice for routine computing of the standard error
of the weighted mean. The bootstrap method has advantages of its own,
including the fact that it is nonparametric, but requires additional
effort and computation time.}}

The analytical formula which is the closest to the bootstrap  is this
one:

var.wtd.mean.cochran <- function(x,w)
#
#	Computes the variance of a weighted mean following Cochran 1977
definition
#
{
n = length(w)
xWbar = wtd.mean(x,w)
wbar = mean(w)
out = n/((n-1)*sum(w)^2)*(sum((w*x-wbar*xWbar)^2)-2*xWbar*sum((w-
wbar)*(w*x-wbar*xWbar))+xWbar^2*sum((w-wbar)^2))
return(out)
}

It's the one I am retaining.

NB: the part two of the paper cited above may also interest you:

@article{Gatz1995AEa,
Author = {Gatz, Donald F and Smith, Luther},
Date-Added = {2007-05-19 14:15:58 +0200},
Date-Modified = {2007-06-01 12:11:53 +0200},
Filed = {Yes},
Journal = {Atmospheric Environment},
Number = {11},
Pages = {1195--1200},
Title = {The standard error of a weighted mean concentration--II.
Estimating confidence intervals},
Url =
{http://www.sciencedirect.com/science/article/B6VH3-3YGV47C-6Y/
2/a187487377ef52b741e3dabdfca97517},
Volume = {29},
Year = {1995},
Abstract = {One motivation for estimating the standard error, SEMw,

of a weighted mean concentration, Mw, of an ion in precipitation is
to use it to compute a confidence interval for Mw. Typically this is
done by multiplying the standard error by a factor that depends on
the degree of confidence one wishes to express, on the assumption
that the weighted mean has a normal distribution. This paper compares
confidence intervals of Mw concentrations of ions in precipitation,
as computed using the assumption of a normal distribution, with those
estimated from distributions produced by bootstrapping. The
hypothesis that Mw was normally distributed was rejected about half
the time (at the 5{\%} significance level) in tests involving nine
major ions measured at ten diverse sites in the National Atmospheric
Deposition Program/National Trends Network (NADP/NTN). Most of these
rejections occurred at sites with fewer than 100 samples, in
agreement with previous results. Nevertheless, the hypothesis was
often rejected at sites with more than 100 samples as well. The
maximum error (relative to Mw) in the 95{\%} confidence limits made
by assuming a normal distribution of the Mw at the ten sites examined
was about 27{\%}. Most such errors were less than 10{\%}, and errors
were smaller at sampling sites with > 100 samples than at those with
< 100 samples.}}

Cheers,

JiHO
---
http://jo.irisson.free.fr/

```