[R] using GAM to assess the linearity in logistic regression

ronggui 0034058 at fudan.edu.cn
Sat Apr 2 08:12:44 CEST 2005


maybe the idea is simle,but the details is beyond me.you are right,gam can capture the non-linearity.but if the results from gam shows little evidence on on-linearity,then we can assume linearity exists. am i right? 

from agresti(2002):
...
Before fitting the model and making such interpretations,
look at the data to check that the logistic regression model is appropriate.
Since Y takes only values 0 and 1, it is difficult to check this by plotting Y
against x.
It can be helpful to plot sample proportions or logits against x.......When X is continuous and all nis1, or when it is essentially continuous
and all ni are small, this is unsatisfactory. One could group the data with
nearby x values into categories before calculating sample proportions and
sample logits. A better approach that does not require choosing arbitrary
categories uses a smoothing mechanism to reveal trends. One such smoothing
approach fits a generalized additive model__Section 4.8., which replaces the
linear predictor of a GLM by a smooth function. Inspect a plot of the fit
to see if severe discrepancies occur from the S-shaped trend predicted
by logistic regression.

from" S-PLUS (and R) Manual to Accompany
Agresti¡¯s Categorical Data Analysis (2002)"(2nd edition,Laura A. Thompson, 2005)

Prior to fitting a logistic regression model to data, one should check the assumption of a logistic relationship between the response and explanatory variables. A simple way
to do this is to use the linear relationship between the logit and the explanatory variable. The values of the explanatory variable can be plotted against the sample logits (p. 168, Agresti) at those values. The plot should look roughly linear for a logistic model to be appropriate. If there are not enough response data at each unique x value (and categorizing x values is undesirable), then the technique of the last section in Chapter 4 can be used (i.e., GAM). There, we saw that a sigmoidal (or S-shaped) trend
appeared in the plot of the response by predictor (Figure 4.7, Agresti).

 from MASS:
....
    Residuals are not always very informative with binary responses but at least
none are particularly large here.
    An alternative approach is to predict the actual live birth weight and later
threshold at 2.5 kilograms. This is left as an exercise for the reader; surprisingly
it produces somewhat worse predictions with around 52 errors.
      We can examine the linearity in age and mother¡¯s weight more flexibly using
generalized additive models. These stand in the same relationship to additive
models (Section 8.8) as generalized linear models do to regression models; replace
the linear predictor in a GLM by an additive model, the sum of linear and
smooth terms in the explanatory variables. We use function gam from S-PLUS.
(R has a somewhat different function gam in package mgcv by Simon Wood.)
> attach(bwt)
> age1 <- age*(ftv=="1"); age2 <- age*(ftv=="2+")
> birthwt.gam <- gam(low ~ s(age) + s(lwt) + smoke + ptd +
ht + ui + ftv + s(age1) + s(age2) + smoke:ui, binomial,
bwt, bf.maxit=25)
> summary(birthwt.gam)
Residual Deviance: 170.35 on 165.18 degrees of freedom
DF for Terms and Chi-squares for Nonparametric Effects
Df Npar Df Npar Chisq P(Chi)
s(age) 1 3.0 3.1089 0.37230
s(lwt) 1 2.9 2.3392 0.48532
s(age1) 1 3.0 3.2504 0.34655
s(age2) 1 3.0 3.1472 0.36829
> table(low, predict(birthwt.gam) > 0)
FALSE TRUE
0 115 15
1 28 31
> plot(birthwt.gam, ask = T, se = T)
Creating the variables age1 and age2 allows us to fit smooth terms for the difference
in having one or more visits in the first trimester. Both the summary and
the plots show no evidence of non-linearity. The convergence of the fitting algorithm
is slow in this example, so we increased the control parameter bf.maxit
from 10 to 25. The parameter ask = T allows us to choose plots from a menu.
Our choice of plots is shown in Figure 7.2.
See Chambers and Hastie (1992) for more details on gam .




On Fri, 01 Apr 2005 23:37:13 -0500
Wensui Liu <liuwensui at gmail.com> wrote:

> I am a little confused about what you asked. 
> 
> If you want to assess the linearity in logistic regression, why do you
> want to use GAM instead of GLM?
> 
> As far as I understand, GAM is used to capture nonlinearity rather linearity.
> 
> Am I right here?
> 
> 
> On Apr 1, 2005 10:19 PM, ronggui <0034058 at fudan.edu.cn> wrote:
> > as agresti(2002) points out that we had better to screen the data to see if the the logit(pi) and the predictor has linear realtionship in logistic regressin.and i find some materials  in MASS and the refernce of s-plus.but it is a bit  simple and i can not exactly master the means to assess the linearity in logistic regression. so anyone suggest some materials?
> > 
> > i am not familiar with GAM,but i think thers maybe some materials can let me use GAM to assess the linearity in logistic regression without master GAM model. is it right?
> > 
> > thank you!
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> > 
> 
> 
> -- 
> WenSui Liu, MS MA
> Senior Decision Support Analyst
> Division of Health Policy and Clinical Effectiveness
> Cincinnati Children Hospital Medical Center




More information about the R-help mailing list