[R-sig-ME] Modeling variance heterogeneity in lme4 + meaning of VarCorr values

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Mon Oct 20 16:49:34 CEST 2014


Dear Emmanuel,

You don't need to create the dummy variables. Just convert Position to a new factor variable. It makes your code much more readable.

d$fPosition <- factor(d$Position)
lmer( Diametre ~ Lecteur + Position + ( 0 + fPosition|Lecture ), data = d, REML = FALSE )

This allows for different variances of the random effect over Lecture, depending on the value of fPosition. This comes at the cost of estimating all covariances as well. So you need to estimate 36 parameters (8 variances and 28 covariances). That is a high number of parameters given the size of your dataset.

A more efficient solution would be to use lme() from the nlme() package and allow for heterogeneity in the variance of the residuals.

lme(Diametre ~ Lecteur + Position, random = ~1|Lecture, weights = varIdent(~ 1|fPosition), data = d, REML = FALSE)

Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces op r-project.org [mailto:r-sig-mixed-models-bounces op r-project.org] Namens Emmanuel Curis
Verzonden: maandag 20 oktober 2014 14:45
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] Modeling variance heterogeneity in lme4 + meaning of VarCorr values

Hello,

I am currently working on estimating precision of artery diameter by an imaging technique. Every 5 mm along one artery, diameter is measured. This was done 10 times by the same reader, and another 10 times by a second reader, both on the same image. The aim is to have an idea of the standard deviation of the diameter.

At first glance, it seems to be much higher at both ends of the artery, probably because diameter changes quickly here hence a very small shift in the origin may have great influence on the diameter --- imagine a king of >----< shape.

To test this hypothesis, I built the following models in lme4, where « Diametre » is the diameter, « Lecteur » the reader, « Lecture » the reading and « Position » the position along the artery (as a factor); d is the data.frame with the data:

# Single variance model
md.simple <- lmer( Diametre ~ Lecteur + Position + ( 1|Lecture ),
                   data = d, REML = FALSE )

# Different variance for each position model #  1) building indicator variables for each position for ( i in 0:7 ) {
    d[ , paste0( 'I.', i ) ] <- ifelse( d$Position == i, 1, 0 ) }

# 2) fit the model
md.tout   <- lmer( as.formula( paste( "Diametre ~ Lecteur + Position +",
                                      paste0( "(0+I.", 0:7,"|Lecture)",
                                      collapse = "+" ) ) ),
                   data = d, REML = FALSE )

My questions are

1) does it seem a pertinent model to really have a different variance at each position?

2) I guess it may be partly undetermined, since variance at position is is the sum of the residual variance sigma² and the grouping term ( I.i|Lecture ) variance si², hence if variance at position i is for instance 1 and is the smallest of all variances, we can have si² = 1, sigma²=0, the otherr way round and any other combination such that the sum is 1. Is it right?

   lmer seems to fit the model with several si² = 0, which seems to mean that sigma² gets the "baseline" variance and terms are used only if local variance is « much » higher, is it true?

   It gives a warning « In checkZrank(reTrms$Zt, n = n, control, nonSmall = 1e+06) : number of observations <= rank(Z); variance-covariance matrix will be unidentifiable », I guess this is the result of unidentifiability. How far does it affects the results, especially fixed effects estimates [« Reader » effect] and further model comparisons?

2) When I try to reconstruct local variances using sigma( md.tout ) and VarCorr( md.tout ), to compare them to variances obtained by tapply( d$Diametre, d$Position, var ), I've noticed that actual standard deviations at position i are in attributes stddev of VarCorr.

This is indeed said in the documentation; however, I did not found what are the actual values stocked directly in VarCorr list. I guess it corresponds to variances in the rescaled model? If it is so, how is determined the rescaling parameter? If not, can anyone explain or give some reference to a technical doc what are these values, and the link with actual variances?

Thanks in advance,

--
                                Emmanuel CURIS
                                emmanuel.curis op parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html

_______________________________________________
R-sig-mixed-models op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.



More information about the R-sig-mixed-models mailing list