[R] Why do two different calls to mgcv::gamm give different p-values?

Tue Oct 2 14:48:52 CEST 2018

Thanks a lot, Simon. That was a silly mistake on my part. After correcting,
there is still a slight difference between the p-values, but this is so
small that I guess it is just numerics.

Best,
Øystein

On Tue, Oct 2, 2018 at 2:26 PM Simon Wood <simon.wood using bath.edu> wrote:

> Dear Øystein,
>
> In your code 'id' is set up to be numeric. In your first gamm call it
> gets coerced to a factor (since nothing else would make sense). In  your
> second gamm call it is simply treated as numeric, since that could makes
> sense. To make the two models equivalent you just need to substitute:
>
> id <- as.factor(rep(1:ng, n/ng))
>
> into your loop.
>
> best,
> Simon
>
> On 01/10/18 08:15, Øystein Sørensen wrote:
> > I have longitudinal data where a response y_{ij} has been measured
> > repeatedly for i=1,...,n individuals, j=1,...,m times for each
> individual.
> > Let x_{ij} be the age of individual i at her/his jth measurement. I wish
> to
> > see how the response varies with age, and have reason to believe that a
> > nonlinear fit is needed. I hence wish to model the relationship using an
> > additive mixed model of the form y_{ij} = f(x_{ij}) + b_{i} +
> > \epsilon_{ij}, where f() is some smooth function to be estimated, b_{i}
> > \sim N(0, \sigma_{b}^{2}) is the random effect for individual i,
> > \epsilon_{ij} \sim N(0, \sigma^{2}) is the residual.
> >
> > Reading the documentation to the mgcv package, I see that such models can
> > be set up by calling the mgcv::gamm two different ways. One way shown to
> > set up such a model is with the call
> > b1 <- gamm(y ~ s(x), random = list(id =~ 1)),
> > where id is an indicator of the specific individual. In this way, the
> > random effect is specified in a list. The other way is to set up the
> random
> > effect with a smooth of the re:
> > b2 <- gamm(y ~ s(x) + s(id, bs = "re")).
> >
> > As far as I can understand, these two setups should create the same
> > underlying model matrices. However, when running them on my data, the
> > p-values for the smooth terms, as well as their estimated degrees of
> > freedom, are different.
> >
> > Below is a reproducible example on simulated data, which show that these
> > two types of specifying the models give different p-values and estimated
> > degrees of freedom. On my real data, these differences are sometimes even
> > more exaggerated.
> >
> > My question is: Are not these two calls to gamm suppose to estimate the
> > same model, or have I misunderstood?
> >
> > Here is a reproducible example:
> >
> > library(mgcv)
> > set.seed(100)
> > n_sim <- 100
> > df <- data.frame(
> >    model1_pval = numeric(n_sim),
> >    model1_edf = numeric(n_sim),
> >    model2_pval = numeric(n_sim),
> >    model2_edf = numeric(n_sim)
> > )
> >
> > # Number of observations
> > n <- 500
> > # Number of groups
> > ng <- 100
> >
> > for(i in 1:n_sim){
> >    # Draw the fixed effect covariate
> >    x <- rnorm(n)
> >    # Set up the group
> >    id <- rep(1:ng, n/ng)
> >    # Draw the random effect
> >    z <- rnorm(ng)
> >    # Draw the response
> >    y <- 0.1 * x + 0.2 * x^3 + z[id] + rnorm(n)
> >
> >    # Fit the two different models
> >    b1 <- gamm(y ~ s(x, k = 5), random = list(id =~ 1))
> >    b2 <- gamm(y ~ s(x, k = 5) + s(id, bs = "re"))
> >
> >    df[i, "model1_pval"] <- anova(b1$gam)$s.pv
> >    df[i, "model1_edf"] <- anova(b1$gam)$edf
> >    df[i, "model2_pval"] <- anova(b2$gam)$s.pv[[1]]
> >    df[i, "model2_edf"] <- anova(b2$gam)$edf[[1]]
> > }
> >
> > # Differences between model 1 and model 2 p values
> > mean(df$model1_pval)
> > #> [1] 6.790546e-21
> > mean(df$model2_pval)
> > #> [1] 3.090694e-14
> > max(abs(df$model1_pval - df$model2_pval))
> > #> [1] 2.770545e-12
> >
> > # Differences between model1 and model 2 estimated degrees of freedom
> > mean(df$model1_edf)
> > #> [1] 3.829992
> > mean(df$model2_edf)
> > #> [1] 3.731438
> > max(abs(df$model1_edf - df$model2_edf))
> > #> [1] 0.6320281
> >
> >
> > Thanks in advance,
> > Øystein Sørensen
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]