# [R] Hmisc package: deff() command's formula for the design effect

Thomas Lumley tlumley at u.washington.edu
Thu May 7 00:13:42 CEST 2009

```On Wed, 6 May 2009, jjh21 wrote:

>
> Hello,
>
> I have been using the Hmisc package's deff() command for some research with
> clustered data. I noticed that the formula to calculate the design effect
> seems a bit different. The formula for the DE is:
>
> 1 + rho*(B - 1)
>
> In most resources I have seen the formula for B to simply be the average
> number of observations in a cluster: n/k if n is the total sample size and k
> is the number of clusters.
>
> However, the deff() command calculates B as: sum(number of observations in
> each cluster^2)/n.
>
> That is a bit hard to write without the Sigma operator. In English it is
> "squaring the number of observations in each cluster, adding all those up,
> and dividing that total by n."
>
> Which formula is correct? Thank you!

The formula in Hmisc is correct (if the correlation doesn't vary with the
cluster size).  If you think of the formula for the variance of a sum, it
involves adding up all the variances and covariances.  A cluster of size k
has k^2-k covariances between members, so the total number of covariances
is sum(k^2-k) over all the clusters, plus the sum(k) variances.

Another way to think of it is that the larger clusters get too much
weight, so in addition to the rho*(B-1) factor that you would have for
equal-sized clusters there is an additional loss of efficiency due to
giving too much weight to the larger clusters.

-thomas

```