[R] logistic regression TRY LOGISTF

Jeff Miller jeffmiller at alphapoint05.net
Thu Mar 15 21:58:26 CET 2007

If Ted is right, then one work-around is to use Firth's method for penalized
log-likelihood. The technique is originally intended to reduce small sample
bias. However, it's now being extended to deal with complete and quasi
separation problems.

I believe the library is called logistf but I haven't had a chance to try
it....I know the SAS version (called the fl macro) works fine.

Reference --

Hope this helps,

Jeff Miller
University of Florida
AlphaPoint05, Inc.

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ted Harding
Sent: Thursday, March 15, 2007 2:39 PM
To: R-help
Subject: Re: [R] logistic regression

On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote:
> Dear All,
> I would like adjust and know the "R2" of following presence/absence
> data:
> x<-1:10
> y<-c(0,0,0,0,0,0,0,1,1,1)
> I tryed use clogit (survival package) but it don´t worked. 
> Any idea?
> miltinho

You are trying to fit an equation

  P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b))

to data

  x =   1   2   3   4   5   6   7   8   9  10

  y =   0   0   0   0   0   0   0   1   1   1

by what amounts to a maximum-likelihood method, i.e. which chooses the
parameter values to maximize the probability of the observed values of y
(given the values of x).

The maximum probability possible is 1, so if you can find parameters which
make P[y = 1] = 0 for x = 1, 2, ... , 7 and P[y = 1] for x = 8, 9, 10 then
you have done it.

This will be approximated as closely as you please for any value of a
between 7 and 8, and sufficiently small values of b, since for such
parameter values P[y = 1 ; x] -> 0 for x < a, and -> 1 for x > a.

You therefore have a solution which is both indeterminate (any a such that 7
< a < 8) and singular (b -> 0). So it will defeat standard estimation

That is the source of your problem. In a more general context, this is an
instance of the "linear separation" problem in logistic regression (and
similar methods, such a probit analysis). Basically, this situation implies
that, according to the data, there is a perfect prediction for the results.

There is no well-defined way of dealing with it; any approach starts from
the proposition "this perfect prediction is not a reasonable result in the
context of my data", and continues by following up what you think should be
meant by "not a reasonable result". What this is likely to mean would be on
the lines of "b should not be that small", which then imposes upon you the
need to be more specific about how small b may reasonably be. Then carry on
from there (perhaps by fixing the value of b at different reasonable levels,
and simply fitting a for each value of b).

Hoping this helps ... but I'm wondering how it happens that you have such
data ... ??

best wishes,

E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Mar-07                                       Time: 19:38:51
------------------------------ XFMail ------------------------------

R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list