[R-sig-ME] Significance of B-splines components in mixed-effects logistic regression (glmer)

Fri Sep 21 16:54:21 CEST 2018

Good afternoon,
I have a problem with reporting significance of b-splines components  
in a mixed-effects logistic regression fit in lme4 (caused by a  
reviewer's comment on a paper). After several hours of searching the  
literature, forums and the internet more generally, I have not found a  
solution and therefore turn to the recipients of this mailing list for  
help. (My questions are at the very end of the mail)

I am trying to model the change in the use of linguistic variable on  
the basis of corpus data. My dataset contains the binary dependent  
variable (DV, variant "a" or "b" being used), 2 random variables (RV1  
and RV2, both categorical) and three predictors (IV1=time, IV2=another  
numeric variable, IV3=a categorical variable with 7 levels).

I wasn't sure if I should attach my (modified) dataset, so I'm trying  
to produce an example. Unfortunately, it doesn't give the same results  
as my original dataset.

library(lme4)
library(splines)
library(languageR)

df <- dative[dative$Modality == "spoken",]
df <- df[,c("RealizationOfRecipient", "Verb", "Speaker",  
"LengthOfTheme", "SemanticClass")]
colnames(df) <- c("DV", "RV1", "RV2", "IV2", "IV3")
set.seed(130)
df$IV1 <- sample(1:13, 2360, replace = TRUE)

My final regression model looks like this (treatment contrast coding):
fin.mod <- glmer(DV~bs(IV1, knots=c(5,9), degree=1)+IV2+IV3+(1|RV1)+(1|RV2),
                  data=df, family=binomial)
summary(fin.mod, corr=FALSE)

where the effect of IV1 is modelled as a b-spline with 2 knots and a  
degree of 1. Anova comparisons (of the original dataset) show that  
this model performs significantly better than a) a model without IV1  
modelled as a b-spline (bs(IV1, knots=c(5,9), degree=1)), b) a model  
with IV1 as a linear predictor (not using bs), c) a model with the df  
of the spline specified instead of the knots (df=3), so that bs  
chooses knots autonomously, and d) a model with only 2 df (bs(IV1,  
df=2, degree=1)). I also ran comparisons with models with quadratic or  
cubis splines, and still my final model performs significantly better.

The problem is that I am reporting this final model in a paper, and  
one of the reviewers comments that I am reporting a non-significant  
effect of IV1 because according to the coefficients table the variable  
does not seem to have a significant effect (outlier correction does  
not make a big difference):

Fixed effects:
                                       Estimate Std. Error z value Pr(>|z|)
(Intercept)                            0.52473    0.50759   1.034    0.301
bs(IV1, knots = c(5, 9), degree = 1)1 -0.93178    0.59162  -1.575    0.115
bs(IV1, knots = c(5, 9), degree = 1)2  0.69287    0.43018   1.611    0.107
bs(IV1, knots = c(5, 9), degree = 1)3 -0.19389    0.61144  -0.317    0.751
IV2                                    0.47041    0.11615   4.050 5.12e-05 ***
IV3level2                              0.30149    0.53837   0.560    0.575
IV3level3                              0.15682    0.48760   0.322    0.748
IV3level4                             -0.89664    0.18656  -4.806 1.54e-06 ***
IV3level5                             -2.90305    0.68119  -4.262 2.03e-05 ***
IV3level6                             -0.32081    0.29438  -1.090    0.276
IV3level7                             -0.07038    0.87727  -0.080    0.936
(coefficients table of the sample dataset will differ)

I know that the results of anova comparisons and what the coefficients  
table shows are two different things (as in the case of IV3 which also  
significantly improves model quality when added to the regression even  
if only few levels show significant contrasts).

My questions are:
How can I justify reporting my regression model when the regression  
table shows only non-significant components for the b-spline term? (Is  
it enough to point to the anova comparisons?)
Is is possible to keep only some components of the b-spline (as  
suggested here for linear regression:  
https://freakonometrics.hypotheses.org/47681)?
Is there a better way of modeling the data? I am not very familiar  
with gamm4 or nlme, for example.

Any help is very much appreciated!
Thank you,
Anne

-- 
Anne Lerche
Institute of British Studies
Leipzig University