[R] Nonlinear logistic regression fitting

Wed Jul 29 22:15:37 CEST 2020

Thanks Duncan,

People working in my field are frequently (and rightly) accused of butchering statistical terminology. So I guess I'm guilty as charged 😄

I will look into the suggested path. One question though in your expression:

loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) )
a + b*x/(c+x)

________________________________
From: Duncan Murdoch <murdoch.duncan using gmail.com>
Sent: Wednesday, July 29, 2020 16:04
To: Sebastien Bihorel <Sebastien.Bihorel using cognigencorp.com>; J C Nash <profjcnash using gmail.com>; r-help using r-project.org <r-help using r-project.org>
Subject: Re: [R] Nonlinear logistic regression fitting

Just a quick note about jargon:  you are using the word "likelihood" in
a way that I (and maybe some others) find confusing. (In fact, I think
you used it two different ways, but maybe I'm just confused.) I would
say that likelihood is the probability of observing the entire data set,
considered as a function of the parameters.  You appear to be using it
(at first) as the probability that a particular observation is equal to
1, and then as the argument to a logit function to give that probability.

What you probably want to do is find the parameters that maximize the
likelihood (in my sense).  The usual practice is to maximize the log of
the likelihood; it tends to be easier to work with.  In your notation
below, the log likelihood would be

   loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) )

When you have a linear logistic regression model, this simplifies a bit,
and there are fast algorithms that are usually stable to optimize it.
With a nonlinear model, you lose some of that, and I'd suggest directly
optimizing it.

Duncan Murdoch

On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote:
> Thank your, Pr. Nash, for your perspective on the issue.
>
> Here is an example of binary data/response (resp) that were simulated and re-estimated assuming a non linear effect of the predictor (x) on the likelihood of response. For re-estimation, I have used gnlm::bnlr for the logistic regression. The accuracy of the parameter estimates is so-so, probably due to the low number of data points (8*nx). For illustration, I have also include a glm call to an incorrect linear model of x.
>
> #install.packages(gnlm)
> #require(gnlm)
> set.seed(12345)
>
> nx <- 10
> x <- c(
>    rep(0, 3*nx),
>    rep(c(10, 30, 100, 500, 1000), each = nx)
> )
> rnd <- runif(length(x))
> a <- log(0.2/(1-0.2))
> b <- log(0.7/(1-0.7)) - a
> c <- 30
> likelihood <- a + b*x/(c+x)
> p <- exp(likelihood) / (1 + exp(likelihood))
> resp <- ifelse(rnd <= p, 1, 0)
>
> df <- data.frame(
>    x = x,
>    resp = resp,
>    nresp = 1- resp
> )
>
> head(df)
>
> # glm can only assume linear effect of x, which is the wrong model
> glm_mod <- glm(
>    resp~x,
>    data = df,
>    family = 'binomial'
> )
> glm_mod
>
> # Using gnlm package, estimate a model model with just intercept, and a model with predictor effect
> int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) )
> emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit',  mu = ~ p_a + p_b*x/(p_c+x),  pmu = c(a, b, c) )
>
> int_mod
> emax_mod
>
> ________________________________
> From: J C Nash <profjcnash using gmail.com>
> Sent: Tuesday, July 28, 2020 14:16
> To: Sebastien Bihorel <Sebastien.Bihorel using cognigencorp.com>; r-help using r-project.org <r-help using r-project.org>
> Subject: Re: [R] Nonlinear logistic regression fitting
>
> There is a large literature on nonlinear logistic models and similar
> curves. Some of it is referenced in my 2014 book Nonlinear Parameter
> Optimization Using R Tools, which mentions nlxb(), now part of the
> nlsr package. If useful, I could put the Bibtex refs for that somewhere.
>
> nls() is now getting long in the tooth. It has a lot of flexibility and
> great functionality, but it did very poorly on the Hobbs problem that
> rather forced me to develop the codes that are 3/5ths of optim() and
> also led to nlsr etc. The Hobbs problem dated from 1974, and with only
> 12 data points still defeats a majority of nonlinear fit programs.
> nls() poops out because it has no LM stabilization and a rather weak
> forward difference derivative approximation. nlsr tries to generate
> analytic derivatives, which often help when things are very badly scaled.
>
> Another posting suggests an example problem i.e., some data and a
> model, though you also need the loss function (e.g., Max likelihood,
> weights, etc.). Do post some data and functions so we can provide more
> focussed advice.
>
> JN
>
> On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote:
>> Hi
>>
>> I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as:
>>
>> L = intercept + emax * x / (EC50+x)
>>
>> Which I guess could be expressed as the following R model
>>
>> ~ emax*x/(ec50+x)
>>
>> As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear.
>>
>> A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... I would be grateful for any opinion on this package or for any alternative recommendation of package/function.
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]