[R-sig-ME] Significance of B-splines components in mixed-effects logistic regression (glmer)
@nne@lerche @ending from uni-leipzig@de
Fri Sep 21 19:04:34 CEST 2018
thank you very much for your very quick reply. It is reassuring that
even though this is one of the first times I am using splines, I seem
to be doing it correctly and I can stick to my lme4 model instead of
delving into gamm4.
I really liked the fortune you sent along in this connection.
Zitat von Ben Bolker <bbolker using gmail.com>:
> fortunes::fortune("should be done")
> The ANOVA comparisons should be all you need. car::Anova() or drop1()
> or afex::mixed() are all convenience functions for that. Since
> parameters for splines are harder to interpret, you could just leave out
> that part of the parameter table ...
> The freakonometrics post you cite concludes:
>> So, it looks like having a lot of non significant components in a
> spline regression is not a major issue. And reducing the degrees of
> freedom is clearly a bad option.
> Furthermore, stepwise throwing-away of terms is a recipe for messing up
> your inference (snooping/garden of forking paths).
> Your modeling approach looks fine; you *could* use gamm4 to get
> penalized regression splines, but again, it's better from an
> inferential/non-snooping point of view to pick a sensible model and
> stick with it unless it's obviously problematic.
> On a technical level, it's not clear whether the "discrepancy" (not
> really) between the summary() results and the anova() results is due to
> (1) the combined effect of a term with several components being
> significant even when the individual components are not; (2) the
> difference between Wald tests (used in summary) and likelihood-based
> tests (used in anova()). This could be disentangled, but IMO it's only
> worth it from a pedagogical/exploratory perspective.
> Ben Bolker
> On 2018-09-21 10:54 AM, Anne Lerche wrote:
>> Good afternoon,
>> I have a problem with reporting significance of b-splines components in
>> a mixed-effects logistic regression fit in lme4 (caused by a reviewer's
>> comment on a paper). After several hours of searching the literature,
>> forums and the internet more generally, I have not found a solution and
>> therefore turn to the recipients of this mailing list for help. (My
>> questions are at the very end of the mail)
>> I am trying to model the change in the use of linguistic variable on the
>> basis of corpus data. My dataset contains the binary dependent variable
>> (DV, variant "a" or "b" being used), 2 random variables (RV1 and RV2,
>> both categorical) and three predictors (IV1=time, IV2=another numeric
>> variable, IV3=a categorical variable with 7 levels).
>> I wasn't sure if I should attach my (modified) dataset, so I'm trying to
>> produce an example. Unfortunately, it doesn't give the same results as
>> my original dataset.
>> df <- dative[dative$Modality == "spoken",]
>> df <- df[,c("RealizationOfRecipient", "Verb", "Speaker",
>> "LengthOfTheme", "SemanticClass")]
>> colnames(df) <- c("DV", "RV1", "RV2", "IV2", "IV3")
>> df$IV1 <- sample(1:13, 2360, replace = TRUE)
>> My final regression model looks like this (treatment contrast coding):
>> fin.mod <- glmer(DV~bs(IV1, knots=c(5,9),
>> data=df, family=binomial)
>> summary(fin.mod, corr=FALSE)
>> where the effect of IV1 is modelled as a b-spline with 2 knots and a
>> degree of 1. Anova comparisons (of the original dataset) show that this
>> model performs significantly better than a) a model without IV1 modelled
>> as a b-spline (bs(IV1, knots=c(5,9), degree=1)), b) a model with IV1 as
>> a linear predictor (not using bs), c) a model with the df of the spline
>> specified instead of the knots (df=3), so that bs chooses knots
>> autonomously, and d) a model with only 2 df (bs(IV1, df=2, degree=1)). I
>> also ran comparisons with models with quadratic or cubis splines, and
>> still my final model performs significantly better.
>> The problem is that I am reporting this final model in a paper, and one
>> of the reviewers comments that I am reporting a non-significant effect
>> of IV1 because according to the coefficients table the variable does not
>> seem to have a significant effect (outlier correction does not make a
>> big difference):
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) 0.52473 0.50759 1.034 0.301
>> bs(IV1, knots = c(5, 9), degree = 1)1 -0.93178 0.59162 -1.575 0.115
>> bs(IV1, knots = c(5, 9), degree = 1)2 0.69287 0.43018 1.611 0.107
>> bs(IV1, knots = c(5, 9), degree = 1)3 -0.19389 0.61144 -0.317 0.751
>> IV2 0.47041 0.11615 4.050
>> 5.12e-05 ***
>> IV3level2 0.30149 0.53837 0.560 0.575
>> IV3level3 0.15682 0.48760 0.322 0.748
>> IV3level4 -0.89664 0.18656 -4.806
>> 1.54e-06 ***
>> IV3level5 -2.90305 0.68119 -4.262
>> 2.03e-05 ***
>> IV3level6 -0.32081 0.29438 -1.090 0.276
>> IV3level7 -0.07038 0.87727 -0.080 0.936
>> (coefficients table of the sample dataset will differ)
>> I know that the results of anova comparisons and what the coefficients
>> table shows are two different things (as in the case of IV3 which also
>> significantly improves model quality when added to the regression even
>> if only few levels show significant contrasts).
>> My questions are:
>> How can I justify reporting my regression model when the regression
>> table shows only non-significant components for the b-spline term? (Is
>> it enough to point to the anova comparisons?)
>> Is is possible to keep only some components of the b-spline (as
>> suggested here for linear regression:
>> Is there a better way of modeling the data? I am not very familiar with
>> gamm4 or nlme, for example.
>> Any help is very much appreciated!
>> Thank you,
> R-sig-mixed-models using r-project.org mailing list
Institute of British Studies
More information about the R-sig-mixed-models