[R] Trends for many units

Sat Jan 6 13:18:59 CET 2001

Hi,

As I read this thread. I can not stop to think that we rarely see this kind
of discussion here and even less in books. Since most of the time, question
like these just die out because they are a bit out of line in a manner of
speaking. I wonder if it would be possible to either accept gracefully more
of these kind of questions here or maybe (just maybe) create a new list
dedicated to discussion on applied statistics using R or something along
those lines.

Just a thought.

Yves Gauvreau
B.E.F.P. Universite du Quebec a Montreal
cyg at sympatico.ca

> -----Message d'origine-----
> De : owner-r-help at stat.math.ethz.ch
> [mailto:owner-r-help at stat.math.ethz.ch]De la part de Prof Brian D Ripley
> Envoye : Friday, January 05, 2001 6:45 PM
> A : s-luppescu at uchicago.edu
> Cc : R Help List
> Objet : Re: [R] Trends for many units
>
>
> On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:
>
> > On 05-Jan-2001 Prof Brian D Ripley wrote:
> > > On Fri, 5 Jan 2001 s-luppescu at uchicago.edu wrote:
> > >
> > >> I have data on every grade in all elementary schools in
> Chicago over 5
> > >> years. I
> > >> would like to estimate a trend over time for each grade in
> each school.
> > >> There
> > >> are 17,600 data all together (about 460 schools, nearly 8
> grades each, over
> > >> 5
> > >> years). Is there a not-so-hard way to do this in R (I was
> thinking of using
> > >> rlm)?
> > >
> > > And the statistical model is?  5 years is a short series, and
> I would have
> > > thought a multilevel model was appropriate (and in R that
> means using lme).
> > > I'l leave it to someone who understands the terms (grades are
> a response
> > > in my terminology) to suggest a model.
> >
> > Yes, there may be some ambiguity in the terminology. ``Grade''
> refers to the
> > year in school (as in, ``first grade'', ``second grade'',
> etc.). Here is a small
> > portion of the data set:
> >
> > Unit     Year      Grade     Pct.Excl
> > 2010      1996         1       0.0789
> > 2010      1997         1       0.0000
> > 2010      1998         1       0.1034
> > 2010      1999         1       0.0286
> > 2010      2000         1       0.0000
> > 2010      1996         2       0.1471
> > 2010      1997         2       0.1282
> > 2010      1998         2       0.0250
> > 2010      1999         2       0.0800
> > 2010      2000         2       0.0588
> > 2010      1996         3       0.0938
> > 2010      1997         3       0.2188
> > 2010      1998         3       0.2000
> > 2010      1999         3       0.1020
> > 2010      2000         3       0.1000
> >
> > Unit is the school number. Basically, I want to do something like:
> > rlm(Pct.Excl ~ Year) for each Unit-Grade combination.
> >
> > > rlm and friends assume independent errors, which looks dubious here.
> >
> > I chose rlm because with the small number of data points (max of 5 per
> > school-grade) a single outlier can have a very large influence.
> I don't know
> > why errors shouldn't be independent here, but I'm willing to be
> convinced.
>
> Wouldn't you expect two grades in one school to be more similar than two
> grades in different schools?  And would not slopes for different grades
> in one school be more similar than across schools?  Those translate into
> dependence.
>
> For an lm-type model you can circumvent this by treating all the
> school-grade combinations as fixed effects.  Thus
>
> (r)lm(Pct.Excl ~ Unit*Grade*Year)
>
> (and I would centre Year on 1998) fits 3520 lines with a common assumed
> error variance. That's a lot of parameters to fit in one go, and you will
> probably find lmList in package nlme helpful.  But my suggestion for a
> model is
>
> i Unit
> j Grade
> t Year
>
> y_{ijt} = mu + beta_j + gamma * t + eta_i + zeta_{ij} + epsilon_{ijt}
>
> eta, zeta, epsilon iid with common variances in each group.
>
> that is fixed effects for Grade, random effects for Unit and Unit | Grade.
> You may or may not need additional random effects
>
> lambda_i * Year + kappa_{ij} * Year
>
> As set up here, independence of all the rvs is plausible, but lme does not
> require it.  The predict.lme will give you BLUP lines for each Unit-Grade
> combination, and they will not be the fitted values in the fixed-effects
> model.  Most social statisticians I know (and we have have some local
> stars) think that the second is more valuable, and routinely use it.
>
> Snijders, T.A.B. and Bosker, R.J. (1999) Multilevel Analysis. Sage.
>
> have an example of IQ tests adminstered to students in classes in schools
> done in exactly this way.  (And that was the ref the experts recommended
> for social applications.)
>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._
>

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._