[R-sig-ME] random effect variance greater than output variable variance

Wed Nov 9 21:33:12 CET 2022

Dear Thierry, 

i used these lines : 

MELM.1 <- lmer(Yield..kg.Ha. ~ Rep.severity.means + Long.term.Apr.Jun + Long.term.total 
+ (1|Location) + (1|Year) + (1|Variety), 
data = yield.disease.rainfall.df) 

summary(MELM.1) 

and compared the outputs of the summary 

summary(MELM.1) 
Linear mixed model fit by REML ['lmerMod'] 
Formula: Yield..kg.Ha. ~ Rep.severity.means + Long.term.Apr.Jun + Long.term.total + 
(1 | Location) + (1 | Year) + (1 | Variety) 
Data: yield.disease.rainfall.df 

REML criterion at convergence: 19679.6 

Scaled residuals: 
Min 1Q Median 3Q Max 
-4.1926 -0.5998 -0.0246 0.5572 5.0190 

Random effects: 
Groups Name Variance Std.Dev. 
Variety (Intercept) 106888 326.9 
Location (Intercept) 512674 716.0 
Year (Intercept) 15724 125.4 
Residual 109754 331.3 
Number of obs: 1352, groups: Variety, 22; Location, 16; Year, 4 

Fixed effects: 
Estimate Std. Error t value 
(Intercept) 160.9075 236.6696 0.680 
Rep.severity.means -3.7333 0.6512 -5.733 
Long.term.Apr.Jun -10.1864 0.8009 -12.719 
Long.term.total 9.8103 0.4631 21.182 

Correlation of Fixed Effects: 
(Intr) Rp.sv. L..A.J 
Rp.svrty.mn -0.038 
Lng.trm.A.J -0.061 -0.061 
Lng.trm.ttl -0.314 0.016 -0.699 

to var() of my output variable : 

> var(yield.disease.rainfall.df$Yield..kg.Ha.) 
[1] 435938 

and it bothers me that this variance is inferior to the one of the location factor reported for random effects in the summary, because it prevents me from using the method I wanted to use to show the results. I wanted to show how much each factor (year, location, and variety/cultivar) influences yield outside of disease severity and rainfalls. 

Do I not understand what these variance values mean for the random effects in the summary ? 
Can it not be compared to the var() of my variable of interest ? 

Thanks ! 

Norman 

De: "Thierry Onkelinx" <thierry.onkelinx using inbo.be> 
À: "Norman DAURELLE" <norman.daurelle using agroparistech.fr> 
Cc: "r-sig-mixed-models" <r-sig-mixed-models using r-project.org> 
Envoyé: Mercredi 9 Novembre 2022 09:34:07 
Objet: Re: [R-sig-ME] random effect variance greater than output variable variance 

Dear Norman, 

Can you show us the full code of the lme4 call and the output of summary(model). How did you calculate the variances for Y and the random effect? 

Best regards, 

ir. Thierry Onkelinx 
Statisticus / Statistician 

Vlaamse Overheid / Government of Flanders 
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST 
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance 
[ mailto:thierry.onkelinx using inbo.be | thierry.onkelinx using inbo.be ] 
Havenlaan 88 bus 73, 1000 Brussel 
[ http://www.inbo.be/ | www.inbo.be ] 

/////////////////////////////////////////////////////////////////////////////////////////// 
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher 
The plural of anecdote is not data. ~ Roger Brinner 
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 
/////////////////////////////////////////////////////////////////////////////////////////// 

[ https://www.inbo.be/ ] 

Op di 8 nov. 2022 om 17:37 schreef Norman DAURELLE via R-sig-mixed-models < [ mailto:r-sig-mixed-models using r-project.org | r-sig-mixed-models using r-project.org ] >: 

Dear list members, 

I used a mixed-effect linear model to estimate the effect of a disease on the yield of a crop, 
and used a formula that was as follows : 

Y ~ X + R1 + R2 + (1|year) + (1|location) + (1|cultivar) 

where for each observation : 

Y is the yield of the crop , 
X the average disease severity in the field, 
R1 and R2 the rainfall values in the 1st and 2nd part of the growing season respectively, 
and year, location and cultivar, the year location and cultivar of the observation. 

I have 5 years, 16 locations and a lot of cultivars, with an unbalanced experiment design. 

The variance given in the summary for the factor Location is greater than the variance of the yield variable taken by itself, and this surprises me. 

I wanted to show the relative importance of each factor over yield through a Venn diagram presenting the variances of each factor as part of the overall yield variance, with each factor's variance overlapping with the others', but the fact that the variance associated with a factor is greater than the variance of the output variable makes me doubt my understanding of the variances shown in a summary for a mixed-effect model. 

Would someone have a simple explanation of what exactly these variances represent ? 

I thought that for a factor with N levels, you had V= ( Σ (xi-μ)² ) / N, with i = 1,..., N, and xi the output variable's mean in the i-th level of the factor, and μ the overall output variable's mean.

Is this not how the variance for a random effect is computed ? 

Thanks for any answer ! 

Cheers, 

Norman 

[[alternative HTML version deleted]] 

_______________________________________________ 
[ mailto:R-sig-mixed-models using r-project.org | R-sig-mixed-models using r-project.org ] mailing list 
[ https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models | https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ] 

	[[alternative HTML version deleted]]