[R-sig-ME] Significance of B-splines components in mixed-effects logistic regression (glmer)

Fri Sep 21 19:04:34 CEST 2018

Dear Ben,
thank you very much for your very quick reply. It is reassuring that  
even though this is one of the first times I am using splines, I seem  
to be doing it correctly and I can stick to my lme4 model instead of  
delving into gamm4.
I really liked the fortune you sent along in this connection.

Best, Anne

Zitat von Ben Bolker <bbolker using gmail.com>:

> fortunes::fortune("should be done")
>
>   The ANOVA comparisons should be all you need.  car::Anova() or drop1()
> or afex::mixed() are all convenience functions for that.  Since
> parameters for splines are harder to interpret, you could just leave out
> that part of the parameter table ...
>
>   The freakonometrics post you cite concludes:
>
>>  So, it looks like having a lot of non significant components in a
> spline regression is not a major issue. And reducing the degrees of
> freedom is clearly a bad option.
>
> Furthermore, stepwise throwing-away of terms is a recipe for messing up
> your inference (snooping/garden of forking paths).
>
> Your modeling approach looks fine; you *could* use gamm4 to get
> penalized regression splines, but again, it's better from an
> inferential/non-snooping point of view to pick a sensible model and
> stick with it unless it's obviously problematic.
>
>   On a technical level, it's not clear whether the "discrepancy" (not
> really) between the summary() results and the anova() results is due to
> (1) the combined effect of a term with several components being
> significant even when the individual components are not; (2) the
> difference between Wald tests (used in summary) and likelihood-based
> tests (used in anova()).  This could be disentangled, but IMO it's only
> worth it from a pedagogical/exploratory perspective.
>
>   Ben Bolker
>
>
>
>
> On 2018-09-21 10:54 AM, Anne Lerche wrote:
>> Good afternoon,
>> I have a problem with reporting significance of b-splines components in
>> a mixed-effects logistic regression fit in lme4 (caused by a reviewer's
>> comment on a paper). After several hours of searching the literature,
>> forums and the internet more generally, I have not found a solution and
>> therefore turn to the recipients of this mailing list for help. (My
>> questions are at the very end of the mail)
>>
>> I am trying to model the change in the use of linguistic variable on the
>> basis of corpus data. My dataset contains the binary dependent variable
>> (DV, variant "a" or "b" being used), 2 random variables (RV1 and RV2,
>> both categorical) and three predictors (IV1=time, IV2=another numeric
>> variable, IV3=a categorical variable with 7 levels).
>>
>> I wasn't sure if I should attach my (modified) dataset, so I'm trying to
>> produce an example. Unfortunately, it doesn't give the same results as
>> my original dataset.
>>
>> library(lme4)
>> library(splines)
>> library(languageR)
>>
>> df <- dative[dative$Modality == "spoken",]
>> df <- df[,c("RealizationOfRecipient", "Verb", "Speaker",
>> "LengthOfTheme", "SemanticClass")]
>> colnames(df) <- c("DV", "RV1", "RV2", "IV2", "IV3")
>> set.seed(130)
>> df$IV1 <- sample(1:13, 2360, replace = TRUE)
>>
>> My final regression model looks like this (treatment contrast coding):
>> fin.mod <- glmer(DV~bs(IV1, knots=c(5,9),
>> degree=1)+IV2+IV3+(1|RV1)+(1|RV2),
>>                  data=df, family=binomial)
>> summary(fin.mod, corr=FALSE)
>>
>> where the effect of IV1 is modelled as a b-spline with 2 knots and a
>> degree of 1. Anova comparisons (of the original dataset) show that this
>> model performs significantly better than a) a model without IV1 modelled
>> as a b-spline (bs(IV1, knots=c(5,9), degree=1)), b) a model with IV1 as
>> a linear predictor (not using bs), c) a model with the df of the spline
>> specified instead of the knots (df=3), so that bs chooses knots
>> autonomously, and d) a model with only 2 df (bs(IV1, df=2, degree=1)). I
>> also ran comparisons with models with quadratic or cubis splines, and
>> still my final model performs significantly better.
>>
>> The problem is that I am reporting this final model in a paper, and one
>> of the reviewers comments that I am reporting a non-significant effect
>> of IV1 because according to the coefficients table the variable does not
>> seem to have a significant effect (outlier correction does not make a
>> big difference):
>>
>> Fixed effects:
>>                                       Estimate Std. Error z value Pr(>|z|)
>> (Intercept)                            0.52473    0.50759   1.034    0.301
>> bs(IV1, knots = c(5, 9), degree = 1)1 -0.93178    0.59162  -1.575    0.115
>> bs(IV1, knots = c(5, 9), degree = 1)2  0.69287    0.43018   1.611    0.107
>> bs(IV1, knots = c(5, 9), degree = 1)3 -0.19389    0.61144  -0.317    0.751
>> IV2                                    0.47041    0.11615   4.050
>> 5.12e-05 ***
>> IV3level2                              0.30149    0.53837   0.560    0.575
>> IV3level3                              0.15682    0.48760   0.322    0.748
>> IV3level4                             -0.89664    0.18656  -4.806
>> 1.54e-06 ***
>> IV3level5                             -2.90305    0.68119  -4.262
>> 2.03e-05 ***
>> IV3level6                             -0.32081    0.29438  -1.090    0.276
>> IV3level7                             -0.07038    0.87727  -0.080    0.936
>> (coefficients table of the sample dataset will differ)
>>
>> I know that the results of anova comparisons and what the coefficients
>> table shows are two different things (as in the case of IV3 which also
>> significantly improves model quality when added to the regression even
>> if only few levels show significant contrasts).
>>
>> My questions are:
>> How can I justify reporting my regression model when the regression
>> table shows only non-significant components for the b-spline term? (Is
>> it enough to point to the anova comparisons?)
>> Is is possible to keep only some components of the b-spline (as
>> suggested here for linear regression:
>> https://freakonometrics.hypotheses.org/47681)?
>> Is there a better way of modeling the data? I am not very familiar with
>> gamm4 or nlme, for example.
>>
>> Any help is very much appreciated!
>> Thank you,
>> Anne
>>
>>
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Anne Lerche
Institute of British Studies
Leipzig University
Beethovenstraße 15
04107 Leipzig

Phone: +493419737407
http://anglistik.philol.uni-leipzig.de/de/institut/linguistik/mitarbeiter/anne-lerche