[R] Components of variance

Sun Jun 26 21:18:30 CEST 2005

Hi, Doug:

	  Thanks for your reply.  I'm keenly aware of many of the issues you've 
mentioned.  (I've learned much of what I know about this from your 
innovation research.)

	  Besides, "lme" provides one answer to that question, if not a great 
one.  Moreover, if someone is really concerned, couldn't they use 
"simulate.lme" in library(nlme), as you did in ch. 4(?) of Pinheiro and 
Bates (2000) Mixed-Effects Models in S and S-Plus (Springer) [Or they 
can request the French-language report that Renaud mentioned].

	  I only tried VarCorr(lmer(...)), because the email from you that 
Renaud cided sounded to me like it might be doable.  If I really wanted 
it, I could read your code.  I don't want it that badly.

	  Thanks again,
	  spencer graves

Douglas Bates wrote:

> On 6/26/05, Spencer Graves <spencer.graves at pdf.com> wrote:
> 
>>Hi, Renaud:
>>
>>          Thanks for the link to Doug Bates' recent comment on this.
>>Unfortunately, the information contained therein is not enough for me to
>>easily obtain all the required standard errors and write code to make
>>confidence intervals.  To be more specific, I'm able to produce the same
>>answers as before using lmer in place of lme as follows:
>>
>>
>>>library(lme4)
>>>set.seed(2)
>>>lot <- rep(LETTERS[1:3], each=9)
>>>lot.e <- rep(rnorm(3), each=9)
>>>wf <- paste(lot, rep(1:9, each=3))
>>>wf.e <- rep(rnorm(9), each=3)
>>>DF <- data.frame(lot=lot, wafer=wf,
>>
>>+ Vt=lot.e+wf.e+rnorm(27))
>>
>>>(fitr <- lmer(Vt~1+(1|lot)+(1|wafer), DF))
>>
>><snip>
>>Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  wafer    (Intercept) 0.96054  0.98007
>>  lot      (Intercept) 1.51434  1.23058
>>  Residual             1.34847  1.16124
>># of obs: 27, groups: wafer, 9; lot, 3
>>
>>Fixed effects:
>>             Estimate Std. Error DF t value Pr(>|t|)
>>(Intercept)  0.60839    0.81329 26  0.7481   0.4611
>>
>>          These numbers agree with what I got with lme.  I then followed your
>>suggetion to "http://tolstoy.newcastle.edu.au/R/help/05/06/7291.html".
>>This suggested I try str(VarCorr(fitr)):
>>
>>
>>>vcFit <- VarCorr(fitr)
>>>str(vcFit)
>>
>>Formal class 'VarCorr' [package "Matrix"] with 3 slots
>>   ..@ scale   : num 1.16
>>   ..@ reSumry :List of 2
>>   .. ..$ wafer:Formal class 'corrmatrix' [package "Matrix"] with 2 slots
>>   .. .. .. ..@ .Data : num [1, 1] 1
>>   .. .. .. .. ..- attr(*, "dimnames")=List of 2
>>   .. .. .. .. .. ..$ : chr "(Intercept)"
>>   .. .. .. .. .. ..$ : chr "(Intercept)"
>>   .. .. .. ..@ stdDev: Named num 0.844
>>   .. .. .. .. ..- attr(*, "names")= chr "(Intercept)"
>>   .. ..$ lot  :Formal class 'corrmatrix' [package "Matrix"] with 2 slots
>>   .. .. .. ..@ .Data : num [1, 1] 1
>>   .. .. .. .. ..- attr(*, "dimnames")=List of 2
>>   .. .. .. .. .. ..$ : chr "(Intercept)"
>>   .. .. .. .. .. ..$ : chr "(Intercept)"
>>   .. .. .. ..@ stdDev: Named num 1.06
>>   .. .. .. .. ..- attr(*, "names")= chr "(Intercept)"
>>   ..@ useScale: logi TRUE
>>
>>          Unfortunately, I've been so far unable to translate this into
>>confidence intervals for the variance components that seem to match
>>intervals(lme(...)).  The closest I came was as follows:
>>
>>
>>>1.23058*sqrt(exp(qnorm(c(0.025, 0.975))*vcFit at reSumry$lot at stdDev))
>>
>>[1] 0.4356044 3.4763815
>>
>>>0.98007*sqrt(exp(qnorm(c(0.025, 0.975))*vcFit at reSumry$wafer at stdDev))
>>
>>[1] 0.4286029 2.2410887
>>
>>          The discrepancy between these numbers and intervals(lme(...)) is 25%
>>for "lot", which suggest that I'm doing something wrong.  Moreover, I
>>don't know how to get a similar interval for "scale", and I don't know
>>how to access the estimated variance components other than manually.
>>
>>          Comments?
>>          Thanks,
>>          spencer graves
>>p.s.  R 2.1.0 patched under Windows XP.
>>
>>Renaud Lancelot wrote:
>>
>>
>>>Spencer Graves a écrit :
>>>
>>>
>>>>       As far as I know, the best available is lme in library(nlme).  For
>>>>more information, see the the following:
>>>>
>>>>       Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus
>>>>(Springer)
>>>>
>>>>       Consider the following example:
>>>>
>>>>
>>>>>set.seed(2)
>>>>>lot <- rep(LETTERS[1:3], each=9)
>>>>>lot.e <- rep(rnorm(3), each=9)
>>>>>wf <- paste(lot, rep(1:9, each=3))
>>>>>wf.e <- rep(rnorm(9), each=3)
>>>>>DF <- data.frame(lot=lot, wafer=wf,
>>>>
>>>>+ Vt=lot.e+wf.e+rnorm(27))
>>>>
>>>>>(fit <- lme(Vt~1, random=~1|lot/wafer, DF))
>>>>
>>>>Linear mixed-effects model fit by REML
>>>>  Data: DF
>>>>  Log-restricted-likelihood: -48.44022
>>>>  Fixed: Vt ~ 1
>>>>(Intercept)
>>>>  0.6083933
>>>>
>>>>Random effects:
>>>> Formula: ~1 | lot
>>>>        (Intercept)
>>>>StdDev:    1.230572
>>>>
>>>> Formula: ~1 | wafer %in% lot
>>>>        (Intercept) Residual
>>>>StdDev:   0.9801403 1.161218
>>>>
>>>>Number of Observations: 27
>>>>Number of Groups:
>>>>           lot wafer %in% lot
>>>>             3              9
>>>>
>>>>>(CI.fit <- intervals(fit))
>>>>
>>>>Approximate 95% confidence intervals
>>>>
>>>> Fixed effects:
>>>>                lower      est.    upper
>>>>(Intercept) -1.100281 0.6083933 2.317068
>>>>attr(,"label")
>>>>[1] "Fixed effects:"
>>>>
>>>> Random Effects:
>>>>  Level: lot
>>>>                    lower     est.    upper
>>>>sd((Intercept)) 0.3368174 1.230572 4.495931
>>>>  Level: wafer
>>>>                   lower      est.    upper
>>>>sd((Intercept)) 0.426171 0.9801403 2.254201
>>>>
>>>> Within-group standard error:
>>>>    lower      est.     upper
>>>>0.8378296 1.1612179 1.6094289
>>>>
>>>>>str(CI.fit)
>>>>
>>>>List of 3
>>>> $ fixed   : num [1, 1:3] -1.100  0.608  2.317
>>>>  ..- attr(*, "dimnames")=List of 2
>>>>  .. ..$ : chr "(Intercept)"
>>>>  .. ..$ : chr [1:3] "lower" "est." "upper"
>>>>  ..- attr(*, "label")= chr "Fixed effects:"
>>>> $ reStruct:List of 2
>>>>  ..$ lot  :`data.frame':       1 obs. of  3 variables:
>>>>  .. ..$ lower: num 0.337
>>>>  .. ..$ est. : num 1.23
>>>>  .. ..$ upper: num 4.5
>>>>  ..$ wafer:`data.frame':       1 obs. of  3 variables:
>>>>  .. ..$ lower: num 0.426
>>>>  .. ..$ est. : num 0.98
>>>>  .. ..$ upper: num 2.25
>>>>  ..- attr(*, "label")= chr "Random Effects:"
>>>> $ sigma   : atomic [1:3] 0.838 1.161 1.609
>>>>  ..- attr(*, "label")= chr "Within-group standard error:"
>>>> - attr(*, "level")= num 0.95
>>>> - attr(*, "class")= chr "intervals.lme"
>>>>
>>>>>diff(log(CI.fit$sigma))
>>>>
>>>>   est.   upper
>>>>0.32641 0.32641
>>>>
>>>>       The last line combined with help for intervals.lme shows that the
>>>>confidence interval for sigma (and doubtless also for lot and wafer
>>>>variance components is based on a normal approximation for the
>>>>distribution of log(sigma).
>>>>
>>>>       The state of the art is reflected in "lmer" in library(lme4),
>>>>described in the following:
>>>>
>>>>       Doug Bates (2005) "Fitting linear mixed models in R" in R News 5/1
>>>>available from "www.r-project.org" -> Newsletter
>>>>
>>>>       However, an "intervals" function is not yet available for "lmer"
>>>>objects.
>>>
>>>
>>>The issue of variance components in lme4 was recently by D. Bates:
>>>
>>>http://tolstoy.newcastle.edu.au/R/help/05/06/7291.html
>>>
>>>Also, a colleague wrote (in French) a nice report - with data and R code
>>>on how to compute variance components, their bias and distribution with
>>>lme (normal approximation , Monte Carlo simulation and delta method).
>>>Let me know if you want it.
>>>
>>>Best,
>>>
>>>Renaud
>>>
>>>
>>>
>>>
>>>>       spencer graves
>>>>
>>>>John Sorkin wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Could someone identify a function that I might use to perform a
>>>>>components of variance analysis? In addition to the variance
>>>>>attributable to each factor, I would also like to obtain the SE of the
>>>>>variances.
>>>>>Thank you,
>>>>>John
>>>>>
>>>>>John Sorkin M.D., Ph.D.
>>>>>Chief, Biostatistics and Informatics
>>>>>Baltimore VA Medical Center GRECC and
>>>>>University of Maryland School of Medicine Claude Pepper OAIC
>>>>>
>>>>>University of Maryland School of Medicine
>>>>>Division of Gerontology
>>>>>Baltimore VA Medical Center
>>>>>10 North Greene Street
>>>>>GRECC (BT/18/GR)
>>>>>Baltimore, MD 21201-1524
>>>>>
>>>>>410-605-7119
>>>>>------ NOTE NEW EMAIL ADDRESS:
>>>>>jsorkin at grecc.umaryland.edu
> 
> 
> Current versions of lmer do not return a Hessian matrix from the
> optimization step so it would not be easy to create such intervals.  I
> do have an analytic gradient from which a Hessian could be derived by
> finite differences but even that wouldn't help without documentation
> on how the profiled deviance is defined and the particular
> parameterization used for the variance components.  These are not
> arbitrary - they are the results of a considerable amount of research
> and experimentation but I have not yet written it up.  It is not
> overly complicated but it is also not entirely straightforward.
> 
> However, I do not regard getting standard errors of the estimates of
> variance components as being important.  In fact I think I am doing a
> service to the community by *not* providing them because they are
> inevitably misused.
> 
> Think back to your first course in statistics when confidence
> intervals were introduced.  You learned that a confidence interval on
> the mean of a population (normally distributed or reasonably close to
> normally distributed) is derived as the sample mean plus or minus a
> multiple of the standard error of the mean.  What about a confidence
> interval on the variance of such a population?  If you calculated that
> as the estimate of the variance plus or minus a multiple of the
> standard error of that estimate then you probably lost points on that
> part of the assignment or exam because we don't calculate a confidence
> interval in that way.  We know that the distribution of S^2 is a
> scaled chi-squared distribution and often very far from symmetric and
> a confidence interval constructed in the manner described above would
> have poor properties and would not reflect the extent of our
> uncertainty about the parameter.  And we never discussed at all a
> hypothesis test on whether the population variance was zero.
> 
> Now let's go to a much more complicated situation with mixed effects
> models.  We know that in the simplest possible case it would be
> extremely misleading to create confidence intervals or perform
> hypothesis tests based on an estimate of a variance and a standard
> error of that estimate but we are perfectly happy to do so in the much
> more complicated situation of variance components or, even worse,
> estimates of covariances.  It doesn't make sense.  Even though SAS
> PROC MIXED and HLM and MLWiN and Stata and ... all produce such tests
> and such confidence intervals for mixed-effects models they still
> don't make sense.  There is no good justification for their use other
> than a vague hand-waving about asymptotics.
> 
> Until someone can explain to me a good use of quantities such as
> standard errors of estimates of variances or covariances, I'm not
> going to stop working on all of the other aspects of the code and
> documentation and redirect my effort to producing these.  If anyone
> wants to do it they can contact me off-list and I'll explain the
> optimization problem and how to get access to the parameterization and
> the gradient.  If you are really energetic I can point you to the
> formulae for an analytic Hessian that I haven't yet had time to
> implement.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915