[R-sig-ME] random effect variance greater than output variable variance

Thierry Onkelinx th|erry@onke||nx @end|ng |rom |nbo@be
Thu Nov 10 10:19:02 CET 2022


Dear Norman,

I think this might be due to the unbalance in your design. You need to
inspect the BLUP of the random effects. Look for the extremes in location
and variety. I would expect some combinations with an extreme positive
(negative) location effect compensated by an extreme negative (positive)
variety effect.

Furthermore look into the fixed effects. Long.term apr-jun is highly
correlated with long.term total. Their effects cancel each other to a
certain extent. I recommend to replace long.term total with its difference
with long.term apr-jun.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op wo 9 nov. 2022 om 21:40 schreef Norman DAURELLE <
norman.daurelle using agroparistech.fr>:

>
> Dear Thierry,
>
> i used these lines :
>
> MELM.1 <- lmer(Yield..kg.Ha. ~ Rep.severity.means + Long.term.Apr.Jun +
> Long.term.total
>                + (1|Location) + (1|Year) + (1|Variety),
>                data = yield.disease.rainfall.df)
>
> summary(MELM.1)
>
> and compared the outputs of the summary
>
>  summary(MELM.1)
> Linear mixed model fit by REML ['lmerMod']
> Formula: Yield..kg.Ha. ~ Rep.severity.means + Long.term.Apr.Jun +
> Long.term.total +
>     (1 | Location) + (1 | Year) + (1 | Variety)
>    Data: yield.disease.rainfall.df
>
> REML criterion at convergence: 19679.6
>
> Scaled residuals:
>     Min      1Q  Median      3Q     Max
> -4.1926 -0.5998 -0.0246  0.5572  5.0190
>
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  Variety  (Intercept) 106888   326.9
>  Location (Intercept) 512674   716.0
>  Year     (Intercept)  15724   125.4
>  Residual             109754   331.3
> Number of obs: 1352, groups:  Variety, 22; Location, 16; Year, 4
>
> Fixed effects:
>                    Estimate Std. Error t value
> (Intercept)        160.9075   236.6696   0.680
> Rep.severity.means  -3.7333     0.6512  -5.733
> Long.term.Apr.Jun  -10.1864     0.8009 -12.719
> Long.term.total      9.8103     0.4631  21.182
>
> Correlation of Fixed Effects:
>             (Intr) Rp.sv. L..A.J
> Rp.svrty.mn -0.038
> Lng.trm.A.J -0.061 -0.061
> Lng.trm.ttl -0.314  0.016 -0.699
>
> to var() of my output variable :
>
> > var(yield.disease.rainfall.df$Yield..kg.Ha.)
> [1] 435938
>
> and it bothers me that this variance is inferior to the one of the
> location factor reported for random effects in the summary, because it
> prevents me from using the method I wanted to use to show the results. I
> wanted to show how much each factor (year, location, and variety/cultivar)
> influences yield outside of disease severity and rainfalls.
>
> Do I not understand what these variance values mean for the random effects
> in the summary ?
> Can it not be compared to the var() of my variable of interest ?
>
> Thanks !
>
> Norman
>
>
>
> ------------------------------
> *De: *"Thierry Onkelinx" <thierry.onkelinx using inbo.be>
> *À: *"Norman DAURELLE" <norman.daurelle using agroparistech.fr>
> *Cc: *"r-sig-mixed-models" <r-sig-mixed-models using r-project.org>
> *Envoyé: *Mercredi 9 Novembre 2022 09:34:07
> *Objet: *Re: [R-sig-ME] random effect variance greater than output
> variable variance
>
> Dear Norman,
>
> Can you show us the full code of the lme4 call and the output of
> summary(model). How did you calculate the variances for Y and the random
> effect?
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op di 8 nov. 2022 om 17:37 schreef Norman DAURELLE via R-sig-mixed-models <
> r-sig-mixed-models using r-project.org>:
>
>>
>> Dear list members,
>>
>> I used a mixed-effect linear model to estimate the effect of a disease on
>> the yield of a crop,
>> and used a formula that was as follows :
>>
>> Y ~ X + R1 + R2 + (1|year) + (1|location) + (1|cultivar)
>>
>> where for each observation :
>>
>> Y is the yield of the crop ,
>> X the average disease severity in the field,
>> R1 and R2 the rainfall values in the 1st and 2nd part of the growing
>> season respectively,
>> and year, location and cultivar, the year location and cultivar of the
>> observation.
>>
>> I have 5 years, 16 locations and a lot of cultivars, with an unbalanced
>> experiment design.
>>
>> The variance given in the summary for the factor Location is greater than
>> the variance of the yield variable taken by itself, and this surprises me.
>>
>> I wanted to show the relative importance of each factor over yield
>> through a Venn diagram presenting the variances of each factor as part of
>> the overall yield variance, with each factor's variance overlapping with
>> the others', but the fact that the variance associated with a factor is
>> greater than the variance of the output variable makes me doubt my
>> understanding of the variances shown in a summary for a mixed-effect model.
>>
>> Would someone have a simple explanation of what exactly these variances
>> represent ?
>>
>> I thought that for a factor with N levels, you had V= ( Σ (xi-μ)² ) / N,
>> with i = 1,..., N, and xi the output variable's mean in the i-th level of
>> the factor, and μ the overall output variable's mean.
>>
>> Is this not how the variance for a random effect is computed ?
>>
>> Thanks for any answer !
>>
>> Cheers,
>>
>> Norman
>>
>>
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list