[R] Difference of AIC computation between R (>2.12) and Splus (7.0.6) during stepwise GAM analysis

Fri May 11 21:34:45 CEST 2012

Dear R Users,

I was wondering if some members of the list could shed some light on the 
difference in AIC computation existing between R (>2.12; gam package) 
and Splus (7.0.6). Because I am not a statistician by training, I would 
like to apologize in advance if I use wrong terms or dot not describe 
GAM appropriately.

As far as I understand, stepwise GAM analysis, as implemented in the gam 
package, relies on the gam, step.gam and gam.fit functions. The 
computation of AIC, which is used as the primary criterion to advance to 
the next step, is delegated to the family function provided by the user 
or set to "gaussian" by default. If one uses the "gaussian" default, AIC 
will be computed as:

AIC <- aic + 2*(n - fit$df.residual)

where:
- aic is the results of the function
aic <- function(y, n, mu, wt, dev) {
            nobs <- length(y)
            nobs * (log(dev/nobs * 2 * pi) + 1) + 2 - sum(log(wt))
}
- y is the vector of observations
- n is the number of observations associated in non-null weights
- mu is the vector of the fitted values
- wt is the vector of weights
- dev is the deviance
- fit is the object containing fittig information

Stepwise GAM analysis, as implemented in Splus, relies on similar but 
somewhat different gam and step.gam functions. For instance, the 
computation of AIC does not depend on any family function in Splus. It 
is hard-coded and performed in the gam.step function and is based upon 
the formula given by Hastie and Pregibon (Hastie, T. J. and Pregibon, D. 
(1992) Generalized linear models. Chapter 6 of Statistical Models in S. 
eds J. M. Chambers and T J. Hastie, Wadsworth & Brooks/Cole.):

AIC <- dev + 2*(n - fit$df.residual)*deviance.lm(fit)/fit$df.resid

After running several GAM analysis in R and Splus, there are obvious 
differences in AIC computation and, thus, final model selection.

Overall, this looks to me like R relies on a maximum likelihood estimate 
of the dispersion, while Splus uses a non-parametric description of the 
dispersion. Is that right? I look into the help pages but could find 
something specific on this point.

I guess my issue boils down to the following questions:
- is there a reference in the literature that would indicate the 
benefits and inconvenient of the two approaches?
- is there a way one can provide arguments to the R gam function so it 
behaves like the Splus function?

Thank you in advance for your feedback and you time.

Sebastien