[R] 'mgcv' package, problem with predicting binomial (logit) data

David Winsemius dwinsemius at comcast.net
Mon Aug 30 21:45:26 CEST 2010


On Aug 30, 2010, at 11:44 AM, Jef Vlegels wrote:

> There was a problem with the data in attachment, here is a better  
> version.
>
> Use something like:
> data <- read.table("pas_r.txt",header=TRUE,sep=";")
> to read it.

I did, but I hate dealing with attached datasets and also am  
uncomfortable dealing with datasets named data, so I messed you code  
up a bit to avoid attach()-ment. I also find it cleaner to use subsets  
of dataframes rather htna coding new variables. Eventually it came  
down to removing the NA for the participant column. I didn't see the  
need for the order() operation and thought it might create problems so  
I just commented it out. (You also had  of misspellings of your model  
names.)

test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),  
family=binomial(logit), data=data.age)
summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data.age,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(data.age$age)
with(data.age, plot(age[I1], test.pred$fit[I1],lty=1, type="l"))
with(data.age, lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2))
with(data.age, lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2))
with(data.age, plot(age[I1], test.pred$fit[I1] , type="l"))

#In a second step, I want to calculate a similar model, but only for
#respondents with a certain characteristic. For example, in this case,  
only
#for male respondents.
#I use a code that looks like this:

#participant_male <- participant[gender=="male"]
# age_male <- age[gender=="male"]

test.males<-gam(participant ~ s(age, fx=FALSE, bs="cr"),  
data=subset(data.age, gender=="male"&!is.na(participant)),
family=binomial(logit), na.action=na.omit)
summary(test.males)
plot(test.males, shade=TRUE)

#I get a nice smoother function in this plot, like I expected.
#Then, when I want to plot the predicted values, I use a code that  
looks like
#this:

Test2.pred <- predict(test.males,newdata=subset(data.age,  
gender=="male"&!is.na(participant)), se.fit=TRUE, type="response")
#I1<-with(subset(data.age, gender=="male"), order(age))
with(subset(data.age, gender=="male"&!is.na(participant)), plot(age,  
Test2.pred$fit, lty=1) )

I have not figured out why the NA's caused so much trouble but was  
eventually driven to exclude them in the subset step by mismatched  
lenght errors from plot(),


>
> Thanks,
> Jef
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: maandag 30 augustus 2010 17:22
> To: Jef Vlegels
> Subject: FYI offlist Re: [R] 'mgcv' package, problem with predicting
> binomial (logit) data
>
>
> On Aug 30, 2010, at 11:17 AM, Jef Vlegels wrote:
>
>> Dear R-help list,
>>
>> I'm using the mgcv package to plot predictions based on the gam
>> function.
>>
>> I predict the chance of being a (frequent) participant at theater
>> plays vs.
>> not being a participant by age.
>> Because my outcome variable is dichotomous, I use the binomial
>> family with
>> logit link function.
>>
>> Dataset in attachment, code to read it in R:
>
> Nope. .sav files are not accepted by the mail-server. See the Posting
> Guide. (.txt and .pdf are the safest formats)
>
>
>> data <- read.spss("pas_r.sav")
>> attach(data)
>>
>> In a first step I use 'gam' to model my data and 'predict' to
>> calculate and
>> plot the predicted values, this all works fine.
>> My code looks like this:
>>
>> test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),
>> family=binomial(logit))
>> summary(test)
>> plot(test, shade=TRUE)
>> gam.check(test)
>> test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response',
>> na.action=na.omit)
>> I1<-order(age)
>> plot(age[I1], test.pred$fit[I1],lty=1, type="l")
>> lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2)
>> lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2)
>> plot(age[I1], test.pred$fit[I1] , type="l")
>>
>> I a second step, I want to calculate a similar model, but only for
>> respondents with a certain characteristic. For example, in this
>> case, only
>> for male respondents.
>> I use a code that looks like this:
>>
>> participant_male <- participant[gender=="male"]
>> age_male <- age[gender=="male"]
>>
>> test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"),
>> family=binomial(logit), na.action=na.omit)
>> summary(test2)
>> plot(test2, shade=TRUE)
>>
>> I get a nice smoother function in this plot, like I expected.
>> Then, when I want to plot the predicted values, I use a code that
>> looks like
>> this:
>>
>> Test2.pred <- predict(test5,se.fit=TRUE, type="response")
>> I1<-order(age_male)
>> plot(age_male[I1], test2.pred$fit[I1],lty=1)
>>
>> This last plot, of the predictions, is not what I expect. It's just
>> a random
>> scatterplot, not what I would expect from the smoother plot. Does
>> anybody
>> know what I did wrong?
>>
>> Thanks in advance,
>> Jef Vlegels
>>
>> Jef Vlegels
>> Ghent University - Department of Sociology
>> Korte Meer 3, B-9000 Gent, Belgium
>> Tel:  09 264 8343
>> www.psw.UGent.be/sociologie
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> <pas_r.txt>______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list