[R-sig-ME] Inflated t-values when using weights with inadequate sample size in lme4?

Mon Mar 28 19:15:09 CEST 2011

A colleague approached me with concerns that there is clustering in her dataset that is unaddressed when using conventional OLS regression.  She had been using GEE, but she expressed a desire to replicate her analysis using mixed-effects modeling in part because her GEE package doesn't generate AIC measures (I don't use GEE, so I couldn't comment).

The tricky part is that her sample size is so small.  She has 28 people who are (nested) members of 15 households, and multiple households therefore have only singletons as representatives.  A scatterplot nevertheless suggests that there are within-household correlations in households with multiple members.

So although I cautioned her that the estimated household-level variance would be practically useless, I thought it might be worthwhile to see how a mixed-effects model changed the estimates of fixed effects in comparison to a conventional OLS model.

An added consideration in her model is that the second explanatory variable is a proportion, weighted by the number of times she was able to record the data for that person.

The response variable is continuous, and I therefore specified this model:

mixed.model <- glmer (Y ~ Age + X2 + (1|House),
	data = AD, REML = T, weights = Sample size of X2, verbose = T)

Strangely, the t-values produced by this model were all about 10 times higher than I would have expected based on the lm summary.  For example, the t-value of "Age" went from 2.9 to 26.5.  The same holds true if I re-run the model using ML (i.e., REML = F).

When I re-specify the model without weights, however, the t-values are generally comparable to the lm estimates . . . albeit a little more conservative, as I would have expected.  (I get similar results when re-running the model in MLwiN.)

Any idea why the specification of weights would lead to the apparent inflation of t-values?

As an aside, we both recognized that the sample size was generally inadequate for mixed-effects modeling, but we're curious to what extent the anomalous results are attributable to the sample size as compared to other possible explanations.

Many thanks,
Jeremy