[R] GAM selection error msgs (mgcv & gam packages)
scarrizo at cs.usyd.edu.au
scarrizo at cs.usyd.edu.au
Sun Jun 18 15:12:05 CEST 2006
Hi all,
My question concerns 2 error messages; one in the gam package and one in
the mgcv package (see below). I have read help files and Chambers and
Hastie book but am failing to understand how I can solve this problem.
Could you please tell me what I must adjust so that the command does not
generate error message?
I am trying to achieve model selection for a GAM which is required for
prediction purposes, thus my focus is on AIC. My data set has 3038 records
and 116 predictor variables and a binary response variable [0 or 1]. There
is no current understanding of the predictors' relationship to response so
I am relying on GAM for selection of appropriate predictors.
Thanks
Savrina
*mgcv package 1.3-12:
# I start with specifying the full model with 116 predictors including
isotropic smooth of 3D location variables (when I specify only the first
14 predictors I get no error message)
>
m0<-gam(label~s(x,y,z,k=50),s+(feature4)+s(feature5)+s(feature6)+...+s(feature116),data=k.data,
family=binomial)
Error in smooth.construct.tp.smooth.spec(object, data, knots):
A term has fewer unique covariate combinations than specified maximum
degrees of freedom
# I was going to follow this with backwards selection by hypothesis testing
(remove highest p-val term one at a time) and also AIC comparison of all
the models
>From help file entitled 'Generalised additive models with integrated
smoothness estimation' I calculated the following where do I go from here?
A) "k is the basis dimension of a given term...if k is not specified
k=10*3^(d-1) where 'd' is the number of covariates for this term"
My calculations: for all my terms but the first d=1 thus k=10*3^0=10.
B) "You must have more unique combinations of covariates than the model has
total parameters"
My calculations: total parameters = sum of basis dimensions(50+10*113) +
sum of non-spline terms(0) - number of spline terms(114) = 1066
*gam package:
I think stepwise selection provided by gam package would be useful in
finding the best predictive model. I follow example on pg 283 from
'Statistical models in S' Chambers and Hastie 1993.
# I start with a full model where all predictors enter linearly
> k.start<-gam(label~., data=k.data, family=binomial)
# set up scope list with possibilities for each term eg .~1 + x + s(x)
# ignore the first column of the data set
> k.scope<-gam.scope(k.data[,-1])
# start step wise selection
> k.step<-step(k.start,k.scope)
#condensed output
Start: AIC=1549.48
label~s+y+z+feature4+feature5+...+feature116
Df Deviance AIC
<none> 1319.5 1549.5
- feature54 -1 1319.2 1551.2
- feature26 -1 1319.2 1551.2
...
-feature12 -1 1357.4 1589.4
There were 50 or more warnings (use warnings() to see the first 50)
# all 50 warnings are the same
> warnings()
Warning messages:
1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x[, jj,
drop = FALSE], y, wt, offset = object$offset, ...
# it seems to not get passed the orginal linear model. It should show all
the steps taken to the final model
> k.step$anova
Step Df Deviance Resid. Df Resid. Dev AIC
1 NA NA 2922 1317.599 1549.599
More information about the R-help
mailing list