[R-sig-ME] SE for predictions in lme

Thu Jan 12 23:15:47 CET 2012

On 12/01/2012, at 10:03 PM, marta rufino wrote:

> Hi,
> 
> This topic has been discussed in several posts in this list before, but I
> could not completely understand if there was a clear answer. Also, I
> searched the books (Pinheiro, Faraway and Zuur) I could not reach a
> conclusion. I guess I am missing some point here :) so I am sorry for
> recover this topic again.
> I will start by providing an example code and then put my questions, as
> clear as possible.
> 
> Following the code provided in previous posts (see for example):
> http://markmail.org/message/lhtols3t5wrleewc
> using another dataset, for example:
> 
>            # Example to ask the list
>            data(iris)
>            kk=data.frame(stack(iris[,1:3]),sp=rep(iris$Species),
> subject=rep(1:dim(iris)[1],3))
>            summary(kk)
>            require(nlme)
>            kk2=lme(values~ind*sp, kk, weights = varIdent(form= ~ 1|ind),
> random=~1|subject)
>            kk2
>            anova(kk2) #all factors are significant.
> 
>            # Calculate model predictions    and plot it ==== this is from
> Ben's code
>                newdat <- expand.grid(ind=levels(kk$ind), sp=levels(kk$sp))
>                newdat$pred <- predict(kk2, newdat, level = 0)
> 
>                Designmat <- model.matrix(eval(eval(kk2$call$fixed)[-2]),
> newdat[-3])
>                predvar <- diag(Designmat %*% kk2$varFix %*% t(Designmat))
>                newdat$SE <- sqrt(predvar)

Prediction for a new observation on one of the subjects in the sample data

>                newdat$SE2 <- sqrt(predvar+kk2$sigma^2)

Prediction for a new observation on a new subject

Exercise: What is the SE for the average of 3 observations on a new subject?

(It is the way the SE changes depending on what exactly one is predicting
that raises questions about what exactly AIC and related statistics are 
designed to optimise, when such statistics are used to compare models.)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

>                #install.packages("ggplot2")
>                library(ggplot2)
>                pd <- position_dodge(width=0.4)
>                ggplot(newdat,aes(x=ind, y=pred, colour=sp))+
>                geom_point(aes(x=kk$ind,y=kk$values, colour=kk$sp),
> size=.3, shape=2, position=pd)+
>                geom_point(position=pd)+
>                geom_linerange(aes(ymin=pred-2*SE2,
> ymax=pred+2*SE2),col="red", position=pd)+
>                geom_errorbar(aes(ymin=pred-2*SE, ymax=pred+2*SE),
> col="black", width=.1, position=pd)
> 
> So, we can see the actual points, the SE and SE2 modelled and respective
> means, I think.
> My questions are:
> What is exactly the SE and SE2 (how do we call it in an article legend, for
> example) and how can these be interpreted?
> Can we consider that when the 'lines' do not overlap the species (or
> varieties) are different?
> The SE2 are larger than the actual data in setosa/Petal.Length and overlap
> in the remaining variaties--- how can these show significant differences
> than?
> 
> I think I am getting confused or I am missing something important, because
> the results do not appear congruent in my head.
> 
> Any help will be much appreciated,
> 
> All the best,
> Marta
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models