# [R] GAM Chi-Square Difference Test

Sat Jul 14 18:41:34 CEST 2012

```We are using GAM in mgcv (Wood), relatively new users, and wonder if anyone
can advise us on a problem we are encountering as we analyze many short time
series datasets. For each dataset, we have four models, each with intercept,
predictor x (trend), z (treatment), and int (interaction between x and z).
Our models are

Model 1: gama1.1 <- gam(y~x+z+int, family=quasipoisson) ##no smooths
Model 2: gama1.2 <- gam(y~x+z+s(int, bs="cr"), family=quasipoisson) ##smooth
the interaction
Model 3: gama1.3 <- gam(y~s(x, bs="cr")+z+int, family=quasipoisson) ##smooth
the trend
Model 4: gama1.4 <- gam(y~s(x, bs="cr")+z+s(int, bs="cr"),
family=quasipoisson) ##smooth trend and interaction

We have three questions. One question is simple. We occasionally obtain edf
=1 and Ref.df=1 for some smoothed predictors (x, int). Because Wood says
that edf can be interpreted roughly as functional form (quadratic, cubic
etc) + 1, this would imply x^0 functional form for the predictor, and that
doesn't make a lot of sense. Does such a result for edf and rdf indicate a
problem (e.g., collinearity) or any particular interpretation?

The other two questions concern which model fits the data best. We do look
at the usual various fit statistics (R^2, Dev, etc), but our question
concerns using the anova function to do model comparisons, e.g.,

anova(gama2.1,gama2.2, test="Chisq").

1. Is there research on the power of the model comparison test? Anecdotally,
the test seems to reject the null even in cases that would appear to have
only small differences. These are not hugely long time series, ranging from
about 17 to about 49, so we would not have thought them to yield large
power.

2. More important, in a few cases, we are getting a result that looks like
this:

anova(gamb1.1,gamb1.2, test="Chisq")
Analysis of Deviance Table

Model 1: y ~ x + z + int
Model 2: y ~ x + z + s(int, bs = "cr")
Resid. Df Resid. Dev         Df   Deviance P(>|Chi|)
1        30     36.713
2        30     36.713 1.1469e-05 1.0301e-05 6.767e-05 ***

We are inclined to think that the significance p value here is simply a
result of rounding error in the computation of the df difference and
deviance difference, and that we should treat this as indicating the models
are not different from each other. Has anyone experienced this before? Is
our interpretation reasonable?

Thanks to anyone who is able to offer advice.