[R] Binary response GLM Question

Joshua Wiley jwiley.psych at gmail.com
Sun Jun 5 04:35:28 CEST 2011


Hi,

Y is not the same as P.  P is the conditional probability given the
data matrix.  So theoretically, P can take on any value in [0, 1],
which means the odds can be anywhere from [0, +infty], not just 0 or
undefined.  In logistic regression, the logit link is pretty standard,
so I do not think you would need to use the empirical logit link.

I am not sure how much detail you want when you ask how does R fit the
glm.  It uses an iterative algorithm.  If you are willing to spend the
time to work through the code, you can learn a lot....just type:
binomial at the console (no quotes no () after it), the source for the
binomial family will print to the console and you can look through the
logit link code.  That gets passed off to glm() to use to fit the
model.  For a more general explanation of the general process, I would
get a book or look online for information on logistic regression or
maximum liklihood estimation.

Cheers,

Josh

On Sat, Jun 4, 2011 at 6:09 PM, casperyc <casperyc at hotmail.co.uk> wrote:
> Hi all,
>
> I have a problem with binary response data in GLM fitting.
> The problem is that the "y" take only 1 or 0, and if I use logit link, it is
> the log of the odds ratio, which is p/(1-p). In my situation, think "y" is
> "p", so sometimes the odds is 0, sometimes it is "1/0", which is (should be)
> undefine? I wonder how R fits the glm?
>
> The FULL detail of this exercise is as follow:
> ----------------------------------------------------------------------------------------------------------
> The data here are concerned with whether people default on a loan taken from
> a particular bank and for identical interest rates and for a fixed period.
> The information on each individual is their sex (male of female); their
> income (in pounds), whether the person is a home owner or not, their age (in
> years), and the amount of the loan (in pounds).
>
> The information recorded is whether the individal defaulted on the loan or
> not. Study the data and try and understand a relation between the persons
> characteristics and defaulting. Specifically, what is your estimated
> probability that a female aged 42, who is not a home owner, has an income of
> 23,500, and took a loan of 12,000, defaults on the loan?
>
> The table holding the data have headings as follows:
>
> m/f: male=1, female=0
> age: age in years
> home: home=1 is a home owner, home=0 is not a home owner
> inc: income
> loan: amount of loan
> def: default=1, non-default=0.
>
> ----------------------------------------------------------------------------------------------------------
>
> my R code
>
> Q3=read.table("tabl3.dat")
> colnames(Q3)=c("Sex","Age","Home","Inc","Loan","Def")
> Q3$Sex=as.factor(Q3$Sex)
> Q3$Home=as.factor(Q3$Home)
> Q3$Def=as.factor(Q3$Def)
>
> Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit))
>
> I dont really get that HOW R actually fits the model? if there is "1/0" that
> it has to calculate?
> This does give me some results but I dont quite feel right about it.
>
> Now,
>
> if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/
> (1+0.5-y) ) as the response, then regress it on the explanntory variables, I
> got some estimated probability to be 0.49***** (when you transfer the log
> odds back to p), whereas the previous model give 0.
>
> Am I wrong in the first place to think that the response is "y=default"?
> How should I approach this?
>
> Thanks!
>
>
> DATA is attached.
>
> http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list