[R-sig-ME] Problem with fitted function

Mon Oct 31 17:00:09 CET 2011

You're email is a bit confusing. Let's keep this simple and draw with a distinction between lm and lmer in terms of what is a fitted value.

In least squares regression, your model is

Y = XB + e

Where Y is your outcome, X is the model matrix for the fixed effects, B is the vector of coefficients, and e is the error term. 

And the fitted values are then

\hat{Y} = X\hat{B}

Where \hat{B} is the vector of estimated fixed effects.

In mixed models, your model is

Y = XB + Zu + e

Where Y is your outcome, X is the model matrix for the fixed effects, B is the vector of coefficients, Z is the model matrix for the random effects, u are the BLUP, and e is the error term.

Now, the fitted function in lmer yields

\hat{Y} = X\hat{B} + Z\hat{u}

Where \hat{B} is the vector of estimated fixed effects and now \hat{u} are the predictions of the random effects.

> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
> bounces at r-project.org] On Behalf Of Ira Sharenow
> Sent: Saturday, October 29, 2011 7:39 PM
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] Problem with fitted function
> 
> I am new to mixed effects and library(lme4). But I have spent quite a bit of
> time trying to understand Prof. Bates's online book.
> 
> 
> 
> I have data from the California Department of Education. I have Academic
> Performance Index (API) scores for 677 schools and I have data from 2002
> through 2009. The Years are coded 0:7.
> 
> 
> 
> API response variable, 200 <=API <= 1000
> 
> Years, 0:7
> 
> CDS, ID numbers, coded as factors
> 
> AVGED, average parental education 1 <=AVGED<=5
> 
> 
> 
> For each year and for each school, I would like to predict API scores. I
> cannot figure out a model that works. My main concern is with fitted().
> 
> 
> 
> For each year, I did standard lm() analyses. The data is not perfectly normal
> but various adjustments made little difference for my purposes. I am satisfied
> with my lm() analyses.
> 
> 
> 
> Specifically, I am attempting to predict the API score for a school, AHS.
> 
> 
> 
> In order to get started I tried,
> 
> 
> 
> model4 = lmer(API ~ 1 + (1 | Years) + (1 | CDS), data = AVGEDAPI0209,
> na.action = na.omit)
> 
> 
> 
> > fixef(model4)
> 
> (Intercept)
> 
>    701
> 
> 
> 
> $Years
> 
>    0          1          2        3       4    5     6      7
> 
> -55      -30       -18        5     12   17    29   40
> 
> 
> 
> $AHS # actually hundreds of numbers were produced and I picked out the AHS ID
> and the associated value
> 
> 124
> 
> 
> 
> So I performed a simple calculation
> 
> 
> 
> # intercept + AHS effect + Years effect
> 
> expAHSScoreBase = numeric(8)
> 
> expAHSScoreBase =  701 + 124 + c(-55, -30,-18,5,
> 
>    12, 17,29, 40)
> 
> 
> 
> > expAHSScoreBase
> 
> [1] 770   795   807   831   838  843   854   866
> 
> 
> 
> I then computed the predicted scores.
> 
> > fittedAHSScore = fitted(model4)[AVGEDAPI0209$CDS == AHSCDS]
> 
> > fittedAHSScore
> 
> [1] 770   795   807   831   838   798   933   761
> 
> 
> 
> 
> 
> 
> 
> The first five scores match, but the last three differ by a significant
> amount. Can someone please offer some advice? Why are those final three
> numbers so different? And my real goal is to predict API scores from AVGED.
> 
> 
> 
> I then tried the model I think I want,
> 
> 
> 
> model5 = lmer(API ~ 1 + AVGED + (1 | Years) + (1 | CDS), data = AVGEDAPI0209,
> na.action = na.omit)
> 
> 
> 
> 
> 
> Unfortunately, when I tried to fit, I again obtained unexpected results
> 
> fittedAHSScore5 = fitted(model5)[AVGEDAPI0209$CDS == AHSCDS]
> 
> > fittedAHSScore5
> 
> [1] 768 794 806 832 837 804 931 763
> 
> 
> 
> However when I added up the pieces, intercept + (AVGED parameter * actual
> value) + Years effect + CDS effect
> 
> 
> 
> I think I got what I wanted.
> 
> 
> 
> > expAHSScoreBase5 = 574 + AVGED5 + 69 + Years5
> 
> > expAHSScoreBase5
> 
> [1] 768   794   806   832   837   842   855   868
> 
> 
> 
> By doing separate lm() regressions for each year, I obtained the following
> predicted scores,
> 
> 
> 
>     800    813   821   838  842    844    854   868
> 
> 
> 
> Finally, I tried the following models and in each case the fitted values
> produced the same pattern of values.
> 
> 
> 
> model6 = lmer(API ~ 1 + Years + (1 + Years| CDS), data = AVGEDAPI0209,
> na.action = na.omit, REML = 0)
> 
> 
> 
> model6AVGED = lmer(API ~ 1 + AVGED + Years + (1 + Years| CDS), data =
> AVGEDAPI0209, na.action = na.omit, REML = 0)
> 
> 
> 
> model7 = lmer(API ~ 1 + Years + (1 | CDS) + (-1 + Years| CDS), data =
> AVGEDAPI0209, na.action = na.omit, REML = 0)
> 
> 
> 
> 
> 
> model7AVGED = lmer(API ~ 1 + AVGED + Years + (1 | CDS) + (-1 + Years| CDS),
> data = AVGEDAPI0209, na.action = na.omit, REML = 0)
> 
> 
> 
> 
> 
> Thanks.
> 
> 
> 
> Ira Sharenow
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models