[R] analyzing cluster sample
Paul von Hippel
von-hippel.1 at osu.edu
Tue Aug 24 23:47:15 CEST 2004
I am analyzing a survey where ~20,000 cases were sampled in ~1000 clusters.
I would like to analyze the data using, for example, gam. What is the best
way to account for the clustering? I've tried including the cluster ID as a
factor in the model formula, but the default response is to try and
estimate the unique effect of each cluster, which given 1000 clusters is
impossibly time consuming. What I want instead is an estimate of the
variance due to clusters, or perhaps an intraclass correlation, and
cluster-adjusted standard errors for the effects of other variables in the
model.
I expect I can account for clustering by using lme with clusters as a
random effect, but then I can't use the flexible smooths available in gam.
If it's not possible to get both clustering and smooths, I may use gam and
adjust the standard errors using an estimate of the design effect.
Many thanks for any advice,
Paul
Paul von Hippel
Department of Sociology / Initiative in Population Research
Ohio State University
More information about the R-help
mailing list