[R] Binary response GLM Question
casperyc
casperyc at hotmail.co.uk
Sun Jun 5 03:09:16 CEST 2011
Hi all,
I have a problem with binary response data in GLM fitting.
The problem is that the "y" take only 1 or 0, and if I use logit link, it is
the log of the odds ratio, which is p/(1-p). In my situation, think "y" is
"p", so sometimes the odds is 0, sometimes it is "1/0", which is (should be)
undefine? I wonder how R fits the glm?
The FULL detail of this exercise is as follow:
----------------------------------------------------------------------------------------------------------
The data here are concerned with whether people default on a loan taken from
a particular bank and for identical interest rates and for a fixed period.
The information on each individual is their sex (male of female); their
income (in pounds), whether the person is a home owner or not, their age (in
years), and the amount of the loan (in pounds).
The information recorded is whether the individal defaulted on the loan or
not. Study the data and try and understand a relation between the persons
characteristics and defaulting. Specifically, what is your estimated
probability that a female aged 42, who is not a home owner, has an income of
23,500, and took a loan of 12,000, defaults on the loan?
The table holding the data have headings as follows:
m/f: male=1, female=0
age: age in years
home: home=1 is a home owner, home=0 is not a home owner
inc: income
loan: amount of loan
def: default=1, non-default=0.
----------------------------------------------------------------------------------------------------------
my R code
Q3=read.table("tabl3.dat")
colnames(Q3)=c("Sex","Age","Home","Inc","Loan","Def")
Q3$Sex=as.factor(Q3$Sex)
Q3$Home=as.factor(Q3$Home)
Q3$Def=as.factor(Q3$Def)
Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit))
I dont really get that HOW R actually fits the model? if there is "1/0" that
it has to calculate?
This does give me some results but I dont quite feel right about it.
Now,
if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/
(1+0.5-y) ) as the response, then regress it on the explanntory variables, I
got some estimated probability to be 0.49***** (when you transfer the log
odds back to p), whereas the previous model give 0.
Am I wrong in the first place to think that the response is "y=default"?
How should I approach this?
Thanks!
DATA is attached.
http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat
--
View this message in context: http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list