[R] mgcv, testing gamm vs lme, which degrees of freedom?

Fri Jun 18 17:03:35 CEST 2010

On Wednesday 16 June 2010 20:33, Carlo Fezzi wrote:
> Dear all,
>
> I am using the "mgcv" package by Simon Wood to estimate an additive mixed
> model in which I assume normal distribution for the residuals. I would
> like to test this model vs a standard parametric mixed model, such as the
> ones which are possible to estimate with "lme".
>
> Since the smoothing splines can be written as random effects, is it
> correct to use an (approximate) likelihood ratio test for this? 
-- yes this is ok (subject to the usual caveats about testing on the boundary 
of the parameter space) but your 2 example models below will have  the same 
number of degrees of freedom!

> If so, 
> which is the correct number of degrees of freedom? 
--- The edf from the lme object, if you are testing using the log likelihood 
returned by the  lme representation of the model.

> Sometime the function 
> LogLik() seems to provide strange results regarding the number of degrees
> of freedom (df) for the gam, for instance in the example I copied below
> the df for the "gamm" are equal to the ones for the "lme", but the
> summary(model.gam) seems to indicate a much higher edf for the gamm.
--- the edf for the lme representation of the model counts only the fixed 
effects + the variance parameters (which includes smoothing parameters). Each 
smooth typically contributes only one or two fixed effect parameters, with 
the rest of the coefficients for the smooth treated as random effects.

--- the edf for the gam representation of the same model differs in that it 
also counts the *effective* number of parameters used to represent each 
smooth: this includes contributions from all those coefficients that the lme 
representation treated as strictly random. 

best,
Simon


> I would be very grateful to anybody who could point out a solution,
>
> Best wishes,
>
> Carlo
>
> Example below:
>
> ----
>
> rm(list = ls())
> library(mgcv)
> library(nlme)
>
> set.seed(123)
>
> x  <- runif(100,1,10)				# regressor
> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)	# random intercept
> id <- rep(1:10, each=10) 			# identifier
>
> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>
> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )  # lme model
>
> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    # gamm
>
> ## same number of "df" according to logLik:
> logLik(f1)
> logLik(f2$lme)
>
> ## much higher edf according to summary:
> summary(f2$gam)
>
> -----------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283