[R] OT: (quasi-?) separation in a logistic GLM
Gavin Simpson
gavin.simpson at ucl.ac.uk
Tue Dec 16 16:01:24 CET 2008
On Tue, 2008-12-16 at 13:31 +0100, vito muggeo wrote:
> dear Gavin,
> I do not know whether such comment may be still useful..
Very much so, Thank you.
>
> Why are you unsure about quasi-separation?
> I think that it is quite evident in the plot
Unsure in the sense that I had been unable to ascertain what
quasi-complete separation was ;-)
I'm still not convinced about the quasi-separation issue though. The
coefficients on the glm are large but the standard errors don't indicate
anything much wrong.
I tried brglm() in the package of the same name and this gave
effectively the same coefficients and standard errors as glm() where I
would have expected them to differ considerably if (quasi-)separation
were an issue. I'm not very familiar with the approach behind brglm()
however.
I'll take a look at the profiling you describe below also when our
computing problems here get sorted.
Apologies if people have had problems downloading the file from my web
space - we are having all sorts of filestore problems here this week.
Thanks again Vito for your comments,
G
>
> plot(analogs ~ Dij, data = dat)
>
> Also it may be useful to see the plot of the monotone (profile) deviance
> (or the log-lik) for the coef of Dij,
>
> xval<-seq(-20,0,l=50)
> ll<-vector(length=50)
> for(i in 1:length(xval)){
> mod <- glm(analogs ~ offset(xval[i]*Dij), data = dat, family = binomial)
> ll[i]<-mod$dev
> }
>
> plot(xval, ll)
>
> Hope this helps you,
>
> vito
>
> Gavin Simpson ha scritto:
> > Dear List,
> >
> > Apologies for this off-topic post but it is R-related in the sense that
> > I am trying to understand what R is telling me with the data to hand.
> >
> > ROC curves have recently been used to determine a dissimilarity
> > threshold for identifying whether two samples are from the same "type"
> > or not. Given the bashing that ROC curves get whenever anyone asks about
> > them on this list (and having implemented the ROC methodology in my
> > analogue package) I wanted to try directly modelling the probability
> > that two sites are analogues for one another for given dissimilarity
> > using glm().
> >
> > The data I have then are a logical vector ('analogs') indicating whether
> > the two sites come from the same vegetation and a vector of the
> > dissimilarity between the two sites ('Dij'). These are in a csv file
> > currently in my university web space. Each 'row' in this file
> > corresponds to single comparison between 2 sites.
> >
> > When I analyse these data using glm() I get the familiar "fitted
> > probabilities numerically 0 or 1 occurred" warning. The data do not look
> > linearly separable when plotted (code for which is below). I have read
> > Venables and Ripley's discussion of this in MASS4 and other sources that
> > discuss this warning and R (Faraway's Extending the Linear Model with R
> > and John Fox's new Applied Regression, Generalized Linear Models, and
> > Related Methods, 2nd Ed) as well as some of the literature on Firth's
> > bias reduction method. But I am still somewhat unsure what
> > (quasi-)separation is and if this is the reason for the warnings in this
> > case.
> >
> > My question then is, is this a separation issue with my data, or is it
> > quasi-separation that I have read a bit about whilst researching this
> > problem? Or is this something completely different?
> >
> > Code to reproduce my problem with the actual data is given below. I'd
> > appreciate any comments or thoughts on this.
> >
> > #### Begin code snippet ################################################
> >
> > ## note data file is ~93Kb in size
> > dat <- read.csv(url("http://www.homepages.ucl.ac.uk/~ucfagls/dat.csv"))
> > head(dat)
> > ## fit model --- produces warning
> > mod <- glm(analogs ~ Dij, data = dat, family = binomial)
> > ## plot the data
> > plot(analogs ~ Dij, data = dat)
> > fit.mod <- fitted(mod)
> > ord <- with(dat, order(Dij))
> > with(dat, lines(Dij[ord], fit.mod[ord], col = "red", lwd = 2))
> >
> > #### End code snippet ##################################################
> >
> > Thanks in advance
> >
> > Gavin
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list