[R-sig-ME] Help understanding residual variance

Thu Mar 29 17:29:52 CEST 2012

Hi Ista,

To me the onis is on the statistician consultant to explain *why* you
cannot have both random intercepts and slopes.  Does the consultant
have papers to reference or proofs?

In any case, this is hardly exclusive to 'R doing something strange'.
SAS and Stata happily join the gang.  See the attached file for code
and output from all three using a minidataset simulated in R.

I suppose one could bicker over whether a random intercept and slope
is a good idea, but possible it certainly is.  You might suggest that
it is poor fare to voice strong opinions about matters which one does
not understand.

Cheers,

Josh

On Thu, Mar 29, 2012 at 3:29 AM, Ista Zahn <istazahn at gmail.com> wrote:
> Hi Reinhold,
>
> Good question. My consultant didn't seem impressed when I tried to
> articulate that explanation, but perhaps I wasn't clear.
>
> Thanks,
> Ista
> On Thu, Mar 29, 2012 at 1:45 AM, Reinhold Kliegl
> <reinhold.kliegl at gmail.com> wrote:
>> But why is Greg Snow's response inadequate?
>>
>> Restating his argument:  In an LMM we are not estimating individual
>> random effects (means, slopes) and individual residuals, but variance
>> of random effects and variance of residuals. So there can be
>> differences between a subject's observed random effect and random
>> slope  and conditional modes of the distribution of the random effects
>> (i.e., the point of maximum density), given the observed data and
>> evaluated at the parameter estimates.
>>
>> I think your statistician's answer is a good argument that you must
>> not treat conditional modes as independent observations in a
>> subsequent analyses. For example, we showed with simulations that
>> correlations between conditional modes of slopes and intercepts are
>> larger than the correlation parameter estimated in the LMM (Kliegl,
>> Masson, & Richer, Visual Cognition, 2010).
>>
>> Reinhold Kliegl
>>
>> --
>> Reinhold Kliegl
>> http://read.psych.uni-potsdam.de/pmr2
>>
>> On Tue, Mar 27, 2012 at 4:18 AM, Ista Zahn <istazahn at gmail.com> wrote:
>>> Hi all,
>>>
>>> I'm trying to understand what the residual variance in this model:
>>>
>>> tmp <- subset(sleepstudy, Days == 1 | Days == 9)
>>> m1 <- lmer(Reaction ~ 1 + Days + (1 + Days | Subject), data = tmp)
>>> tmp$fitted1 <- fitted(m1)
>>>
>>> represents. The way I read this specification, an intercept and a
>>> slope is estimated for each subject. Since each subject only has two
>>> measurements, I would expect the Reaction scores to be completely
>>> accounted for by the slopes and intercepts. Yet they are not: the
>>> Residual variance estimate is 440.278.
>>>
>>> This is probably a stupid question, but I hope you will be kind enough
>>> to humor me.
>>>
>>> Best,
>>> Ista
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
-------------- next part --------------
############################################################
#                             R                            #
############################################################
set.seed(1)
d <- data.frame(y = c(y <- rnorm(100), y + rnorm(100)),
  x = rep(0:1, each = 100), id = factor(rep(1:100, 2)))

d <- d[order(d$id), ]

require(lme4)
summary(lmer(y ~ 1 + (1 + x | id), data = d))
## > summary(lmer(y ~ 1 + (1 + x | id), data = d))
## Linear mixed model fit by REML
## Formula: y ~ 1 + (1 + x | id)
##    Data: d
##    AIC   BIC logLik deviance REMLdev
##  548.6 565.1 -269.3    535.6   538.6
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  id       (Intercept) 0.60384  0.77707
##           x           0.50393  0.70988  0.366
##  Residual             0.20293  0.45047
## Number of obs: 200, groups: id, 100

## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.10885    0.08982   1.212

require(nlme)
summary(lme(y ~ 1, random = ~ 1 + x | id, data = d))

## > summary(lme(y ~ 1, random = ~ 1 + x | id, data = d))
## Linear mixed-effects model fit by REML
##  Data: d
##        AIC      BIC    logLik
##   548.6301 565.0967 -269.3151

## Random effects:
##  Formula: ~1 + x | id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr
## (Intercept) 0.8231605 (Intr)
## x           0.8071236 0.193
## Residual    0.3594008

## Fixed effects: y ~ 1
##                 Value  Std.Error  DF  t-value p-value
## (Intercept) 0.1088521 0.08981989 100 1.211893  0.2284

## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max
## -1.18533542 -0.26837701 -0.03513932  0.29186404  1.25726715

## Number of Observations: 200
## Number of Groups: 100

require(foreign)
write.dta(d, file = "d:/d.dta")
write.csv(d, file = "d:/d.csv", row.names = FALSE)

############################################################
#                            SAS                           #
############################################################

# options nocenter nolabel nodate formchar="|----|+|---+=|-/\<>";
# PROC IMPORT OUT= WORK.d
#             DATAFILE= "D:\d.csv"
#             DBMS=CSV REPLACE;
#      GETNAMES=YES;
#      DATAROW=2;
# RUN;
#
# proc mixed data=d method=reml noclprint;
#   class id;
#   model y = / solution;
#   random int x / subject=id type=un;
# run;

## The Mixed Procedure
##                   Model Information
## Data Set                     WORK.D
## Dependent Variable           y
## Covariance Structure         Unstructured
## Subject Effect               id
## Estimation Method            REML
## Residual Variance Method     Profile
## Fixed Effects SE Method      Model-Based
## Degrees of Freedom Method    Containment

##             Dimensions
## Covariance Parameters             4
## Columns in X                      1
## Columns in Z Per Subject          2
## Subjects                        100
## Max Obs Per Subject               2

## Number of Observations
## Number of Observations Read             200
## Number of Observations Used             200
## Number of Observations Not Used           0
##                      Iteration History
## Iteration    Evaluations    -2 Res Log Like       Criterion
##         0              1       615.81799232
##         1              4       545.16825286      0.05801788
##         2              1       539.21410924      0.00597671
##         3              1       538.64539097      0.00017233
##         4              1       538.63016700      0.00000020
##         5              1       538.63014985      0.00000000
##                    Convergence criteria met.

##  Covariance Parameter Estimates
## Cov Parm     Subject    Estimate
## UN(1,1)      id           0.3519
## UN(2,1)      id           0.4540
## UN(2,2)      id                0
## Residual                  0.4549

##            Fit Statistics
## -2 Res Log Likelihood           538.6
## AIC (smaller is better)         544.6
## AICC (smaller is better)        544.8
## BIC (smaller is better)         552.4

##   Null Model Likelihood Ratio Test
##     DF    Chi-Square      Pr > ChiSq
##      2         77.19          <.0001

##                    Solution for Fixed Effects
##                          Standard
## Effect       Estimate       Error      DF    t Value    Pr > |t|
## Intercept      0.1089     0.08982      99       1.21      0.2284

############################################################
#                          Stata                           #
############################################################

# use "d:/d.dta", clear
# xtmixed y || id: x, cov(un) reml
# di -2*e(ll)

## . use "d:/d.dta", clear
## (Written by R.              )

## . xtmixed y || id: x, cov(un) reml

## Performing EM optimization:

## Performing gradient-based optimization:

## Iteration 0:   log restricted-likelihood = -269.31507
## Iteration 1:   log restricted-likelihood = -269.31507  (backed up)

## Computing standard errors:

## Mixed-effects REML regression                   Number of obs      =       200
## Group variable: id                              Number of groups   =       100

##                                                 Obs per group: min =         2
##                                                                avg =       2.0
##                                                                max =         2

##                                                 Wald chi2(0)       =         .
## Log restricted-likelihood = -269.31507          Prob > chi2        =         .

## ------------------------------------------------------------------------------
##            y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
## -------------+----------------------------------------------------------------
##        _cons |   .1088521   .0898198     1.21   0.226    -.0671914    .2848956
## ------------------------------------------------------------------------------

## ------------------------------------------------------------------------------
##   Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
## -----------------------------+------------------------------------------------
## id: Unstructured             |
##                        sd(x) |   .8071714   31.05985      1.42e-33    4.58e+32
##                    sd(_cons) |   .8231815   15.22811      1.48e-16    4.59e+15
##                corr(x,_cons) |   .1930683   48.73245            -1           1
## -----------------------------+------------------------------------------------
##                 sd(Residual) |   .3593491    34.8834      8.44e-84    1.53e+82
## ------------------------------------------------------------------------------
## LR test vs. linear regression:       chi2(3) =    77.19   Prob > chi2 = 0.0000

## Note: LR test is conservative and provided only for reference.

## . di -2*e(ll)
## 538.63015