[R-sig-ME] contrasts vs. directly modelling differences: big diff?

Wed Sep 29 16:06:23 CEST 2021

Dear Guillaume,

 As Thierry says, yield is the response of your experiment, yet you choose to define yield loss as delta(yield)/yield rather than the more obvious delta(yield).

Clearly if delta(yield) were your measure of yield loss, then yield would be the appropriate outcome to work with.

But if you prefer delta(yield)/yield as your measure of loss, then this is different and the appropriate outcome is log(yield). This follows directly because delta(log(yield)) is delta(yield)/yield. 

Better still, work with 100 log(yield) as the outcome, then differences in outcome can be viewed as being in % units. Read more here: https://doi.org/10.1136/BMJ.J3683 

Best wishes,
Tim Cole

Population Policy and Practice
UCL Great Ormond Street Institute of Child Health
30 Guilford Street, London WC1N 1EH
    ------------------------------
    Date: Wed, 29 Sep 2021 14:27:40 +0200
    From: Thierry Onkelinx <thierry.onkelinx using inbo.be>
    To: Guillaume Adeux <guillaumesimon.a2 using gmail.com>
    Cc: R-mixed models mailing list <r-sig-mixed-models using r-project.org>
    Subject: Re: [R-sig-ME]  contrasts vs. directly modelling differences:
    	big diff?
    Message-ID:
    	<CAJuCY5zPg+z7uD6Xa8Q_o3DsVE2w6yRfH1ors0Tw=FGZUi1hjw using mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"

    Dear Guillaume,

    I prefer to model the yield as that is the response of your experiment. It
    is easier to find a suitable distribution for the yield.

    Best regards,

    ir. Thierry Onkelinx
    Statisticus / Statistician

    Vlaamse Overheid / Government of Flanders
    INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
    FOREST
    Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
    thierry.onkelinx using inbo.be
    Havenlaan 88 bus 73, 1000 Brussel
    https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.inbo.be%2F&data=04%7C01%7Ctim.cole%40ucl.ac.uk%7Cd7980087949e4ef4d2fb08d98345c822%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637685158208166257%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5eQEnxVLmZDHOnwEoVjOvRZYEDYwmQEBlUekB9tN0ng%3D&reserved=0

    ///////////////////////////////////////////////////////////////////////////////////////////
    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey
    ///////////////////////////////////////////////////////////////////////////////////////////

    <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.inbo.be%2F&data=04%7C01%7Ctim.cole%40ucl.ac.uk%7Cd7980087949e4ef4d2fb08d98345c822%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637685158208166257%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SDvKjBFdx0bQwMqQSj2O7ed%2BeaxVDGnphhrx2ZEzFUc%3D&reserved=0>

    Op wo 29 sep. 2021 om 14:04 schreef Guillaume Adeux <
    guillaumesimon.a2 using gmail.com>:

    > Good afternoon everyone,
    >
    > I have this recurring question of whether it is best to directly model the
    > response as % differences (e.g. Yield_loss=(Yield_without_weeds -
    > Yield_with_weeds)/(Yield_without_weeds)) or whether it is best to directly
    > model the response (e.g. Yield) and compute yield loss through post hoc
    > contrasts on the log scale.
    >
    > I hope this following example can illustrate better:
    > Let's take two different weed communities: WC1 and WC2.
    > Each community is present on 6 fields, with multiple samples per field.
    > Next to each weedy sample of WC1 and WC2 within each of the 6 fields, there
    > is a hand weed control, inducing a hierarchical structure (paired data
    > within each field for each of the two weed communities).
    > The objective is to compute yield loss induced by the two communities and
    > to compare them.
    > One option could be to directly compute yield loss (e.g.
    > Yield_loss=(Yield_without_weeds - Yield_with_weeds)/(Yield_without_weeds))
    > for each weedy/weeded couple within each field and model
    > *
    > mod0=glmer(Yield_loss~WC+(1|field)+(1|field:WC),family="binomial"),data=yl)*
    > (I
    > suppose beta or beta_binomial would also be a reasonable choice but it's
    > not the matter of today). Comparisons could then be made with
    > *cld(emmeans(mod0,~WC))*
    >
    > Another option could be to directly model the response (e.g. Yield),
    > introduce a "Handweeding" (yes/no) variable and compute Yield loss through
    > the following code:
    >
    > *mod1=lmer(log(Yield)~Handweeding*WC+(1|field)+(1|field:Handweeding)+(1|field:WC)+(1|field:Handweeding:WC),data=yl)*
    >
    > *x=pairs(emmeans(mod1,~Handweeding|WC),reverse=TRUE)
    > y=regrid(x,transform="response")
    > *# differences on the log scale are exponentiated
    > *summary(y,infer=c(TRUE,TRUE),null=1) *# is yield loss significantly
    > different from 0 for each of the 2 community?
    > *cld(emmeans(y,~WC),adjust="mvt") *# is yield loss induced by WC1 different
    > from yield loss induced by WC2?
    >
    > Are both these "procedures" correct? Which is preferable? Why?
    >
    > Do not hesitate to request further information if I wasn't clear enough.
    > Thanks a lot.
    > Guillaume ADEUX