[R-sig-ME] problem fitting a mixed model due to between subjectvariation

Wed Jul 28 10:24:45 CEST 2010

Dear David,

Thanks for the reply. I wish it were homework... These are not my own
data and the people who provided me with the data gave me a choice:
either I could use their data but then it should be impossible to trace
back to any content matter, or I should generate data and then I could
explain where the data came from. I chose the first option...

I also looked at the random slopes model, but since it gave a
correlation of -1 I first tried to solve the simpler case.

Indeed from a graphical inspection (not as nice as yours) I also
concluded that there really should not be any effect of pred for
individual subjects. I thought that conclusion was even valid for
clusters of all sizes. But for the people who provided me with the data,
this is only a starting model; they want to look at the effect of
additional covariates, so a conclusion like 'there is no effect of pred'
is not enough.

Restricting the analysis to the large clusters only, things do not seem
to become simpler. Following your example:
# t1 <-  table(strangedat$subject)
strangedatlargeclust <- strangedat[strangedat$subject %in%
rownames(t1)[t1 == 6],]
strangedatlargeclust$subject <- factor(strangedatlargeclust$subject) #
dropping unused levels (still don't know how to do that in one step)

# using nlme as in my previous post:

lmemodlargeclust <- lme(outcome ~ pred, data = strangedatlargeclust,
random = ~pred|subject)
summary(lmemodlargeclust)

# or using lme4
# in that case use:
# detach('package:nlme')
# require(lme4)

lmemodlargeclust <- lmer(outcome ~ pred + (pred|subject), data =
strangedatlargeclust)
summary(lmemodlargeclust)

For lme4 I obtain:

Linear mixed model fit by REML 
Formula: outcome ~ pred + (pred | subject) 
   Data: strangedatlargeclust 
   AIC   BIC logLik deviance REMLdev
 486.4 508.2 -237.2      464   474.4
Random effects:
 Groups   Name        Variance Std.Dev. Corr  
 subject  (Intercept) 0.0000   0.00000        
          pred        0.0000   0.00000    NaN 
 Residual             0.3168   0.56285        
Number of obs: 276, groups: subject, 46

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.21668    0.07068    3.07
pred         0.88911    0.02550   34.87

(For nlme things appear to be not that gross, but I guess that is only
appearance:
Linear mixed-effects model fit by REML
 Data: strangedatlargeclust 
       AIC      BIC    logLik
  486.4296 508.1084 -237.2148

Random effects:
 Formula: ~pred | subject
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 1.423596e-05 (Intr)
pred        4.608392e-07 0     
Residual    5.628509e-01       

Fixed effects: outcome ~ pred 
                Value  Std.Error  DF  t-value p-value
(Intercept) 0.2166817 0.07067610 229  3.06584  0.0024
pred        0.8891061 0.02549993 229 34.86700  0.0000
 Correlation: 
     (Intr)
pred -0.878

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-4.12521685 -0.48108890  0.01259325  0.46206772  4.32871991 

Number of Observations: 276
Number of Groups: 46)

Since I would hope that using a mixed model would provide me with a
conditional model (in contrast to a marginal model), I would really want
a regression weight of pred that is near 0...

Best regards,
Rogier

-----Oorspronkelijk bericht-----
Van: David Duffy [mailto:davidD at qimr.edu.au] 
Verzonden: woensdag 28 juli 2010 1:04
Aan: Donders, Rogier
CC: r-sig-mixed-models at r-project.org
Onderwerp: Re: [R-sig-ME] problem fitting a mixed model due to between
subjectvariation

On Tue, 27 Jul 2010, R.Donders at ebh.umcn.nl wrote:

> Dear all,
>
> I have a problem with an mixed analysis. All though residual plots and
a
> fixed analysis suggest that (at least)
> a random intercept should be incorporated in the model, I keep getting
a
> zero variance (or almost zero variance) for the random intercept,
> I guess because of the huge between subject differences.
>
> summary(m2)

The random slopes model looks like:

> summary(m2)
Linear mixed model fit by REML
Formula: outcome ~ pred + (pred | subject)
    Data: strangedat
    AIC   BIC logLik deviance REMLdev
  781.4 805.6 -384.7      759   769.4
Random effects:
  Groups   Name        Variance Std.Dev. Corr
  subject  (Intercept) 0.095927 0.30972
           pred        0.015808 0.12573  -1.000
  Residual             0.340696 0.58369
Number of obs: 417, groups: subject, 80

Fixed effects:
             Estimate Std. Error t value
(Intercept)  0.31775    0.07503   4.235
pred         0.85699    0.02806  30.537

Correlation of Fixed Effects:
      (Intr)
pred -0.901

The random intercepts model gave:
  AIC   BIC logLik deviance REMLdev
  783 799.2 -387.5      764     775

It looks to me that the largest clusters tend to run horizontally wrt 
pred, that is across the strong between-clusters line of identity.

with(strangedat, plot(outcome ~ pred, col=as.integer(subject)))
z1 <- by(strangedat, strangedat$subject, mean)
z2 <- matrix(unlist(z1),nc=3,byrow=T)
z3 <- as.data.frame(z2[,-1])
names(z3) <- c("outcome","pred")
points(outcome ~ pred, data=z3, pch=15)

t1 <-  table(strangedat$subject)
sss <- split(strangedat, strangedat$subject)
library(locfit)
for(s in rownames(t1)[t1==6]) lines(locfit(outcome ~ pred, alpha=3, 
data=sss[[s]]))

This wasn't your homework, was it? ;)

Cheers, David Duffy.
-- 
| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.