[R] Survey analysis of repeated relationships?
tlumley at u.washington.edu
Fri Aug 20 19:28:45 CEST 2004
On Fri, 20 Aug 2004, [ISO-8859-1] "Jens Oehlschlägel" wrote:
> I just discovered the great piece of software that is available with the
> survey package. Many thanks and 'Hats off' to Thomas Lumley.
> While package survey covers analysis of features of objects sampled (in
> clusters, strata) I could not find analysis of features of repeated
> relationsships between sampled objects (in clusters, strata). My
> understanding is that it is not adequate to treat relationships as sampled
> by themselves, because correlations between relationships introduced by
> repeated involvement of the same objects would underestimate variability.
At first I thought you were just interested in longitudinal survey data,
which would be straightforward. This doesn't look straightforward.
I don't know what standard methods there are in survey statistics for
this, but I have looked at this problem in non-survey contexts, and the
linearisation and reweighting arguments seem to transfer in most cases,
so this might be helpful if you don't get anything more focused.
The basic units of analysis are binary relationships between some sampled
individuals, and your data are measurements on these relationships, rather
than being the presence or absence of such relationships. Now, if
individuals are independent then two relationships will be independent as
long as they share no individuals. Arguments remarkably similar to those
for time series in my previous message this morning say that a "sandwich"
variance estimator can be obtained in three parts. Using your example of
variance using speaker as PSU + variance using listener as PSU -
variance using speaker:listener combination as PSU
This uses an estimate of variance based on summing individual cross
products of residuals over all pairs of observations not known to be
independent. There is a good heuristic case for leaving out the third
term at least in situations where the speaker and listener factors are not
too far from orthogonal.
The sampling weights for a pair are needed. If individuals are sampled
independently the weight for a pair is the product of the individual
The sandwich estimator is simple, but even in model-based terms the theory
for this is not entirely straightforward. Details are at
> Can anyone point me to software/literature/people dealing with estimating
> variance / estimating sample size of such surveys?
> Best regards
> P.S. two examples follow
> Professional Talks
> Strata: Professions
> Sampled Objects: Professionals
> Repeated Relationships: Make some of them talk to each other such that
> one Professional is involved in several talks
> (to different other professionals and possibly
> several times to the same)
> Features: Binary evaluation of talk
> Analysis target: Fraction of 'good' talks
> - for each combination of professions
> - in overall population
> TCP Network Time
> Strata: Geographic regions
> Sampled Clusters: Towns
> Sampled Objects: Routers
> Features: package travel times (or hop numbers)
> Analysis target: Average travel times
> - for each combination of towns
> - for each combination of geographic regions
> - in the overall population
> NEU: Bis zu 10 GB Speicher für e-mails & Dateien!
> 1 GB bereits bei GMX FreeMail http://www.gmx.net/de/go/mail
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help