[R] How to find inverse of glm model?

Fri Apr 25 03:31:52 CEST 2025

Luigi,

Your diagnostic steps for estimating Min, Max, and HalfFluoCycle look spot-on. You correctly identified that your linear model gave an unreasonably large slope (SLP = 57,724), which likely caused the nls() convergence failure in mod1.

Makes sense that fixing the slope at 2 gave you a stable fit. That confirms the importance of reasonable starting values, especially for parameters like Slope, which drastically influence the logistic shape.

Recommendations for Generalizing This:
If you're applying this to other datasets, you don’t necessarily need to "hard-code" Slope = 2, but you can default to 2 and allow refinement if the fit isn't good.

Here’s one simple strategy:

># fallback slope guess
>initial_slope <- 2
>
># optionally, refine using the steepest point
># Find approximate cycle where derivative (change) is highest
>diffs <- diff(df$Fluo)
>steepest_idx <- which.max(diffs)
># Estimate slope based on this region
>if (steepest_idx > 1 && steepest_idx < nrow(df)-1) {
>  local_mod <- lm(Fluo ~ Cycle, data = df[(steepest_idx-1):>(steepest_idx+1), ])
>  est_slope <- abs(local_mod$coefficients[2])
>  if (est_slope < 1000) initial_slope <- est_slope  # sanity check
>}

Then use:

>mod <- nls(Fluo ~ MaxFluo / (1 + exp(-(Cycle - HalfFluoCycle)/Slope)) + >Noise,
>  data = df,
>  start = list(MaxFluo = MAX_FLUO,
>             HalfFluoCycle = HLF_CYC,
>             Slope = initial_slope,
>             Noise = S))

Fixing the Slope Can Still Be Valid - in practice, if your datasets have similar kinetics (e.g., same reaction conditions or system), the slope shape might be consistent across runs, so fixing or constraining the slope (e.g., bounding it within 1 < Slope < 5) is entirely reasonable and will increase model stability.

You can also use nlsLM() from minpack.lm  - it should provide better convergence and bounds:

>library(minpack.lm)
>mod <- nlsLM(Fluo ~ MaxFluo / (1 + exp(-(Cycle - HalfFluoCycle)/Slope)) >+ Noise,
>   data = df,
>   start = list(MaxFluo = MAX_FLUO,
>             HalfFluoCycle = HLF_CYC,
>             Slope = 2,
>             Noise = S),
>    lower = c(0, 0, 0.5, 0), upper = c(Inf, 35, 10, Inf))

Next best step would be to package this into a reusable function — either for applying it across multiple datasets or for doing reverse lookups (like finding the Cycle corresponding to a specific Fluo reading).

Best,
Gregg

On Thursday, April 24th, 2025 at 12:27 PM, Luigi Marongiu <marongiu.luigi using gmail.com> wrote:

> 

> 

> Thank you. Following your tips I tried to guess the starting values
> using an approached that determined (1) the background level (the
> fluorescence before the take over signal), (2) the slope between this
> point and the maximum:
> `df <- data.frame(Cycle=1:35, Fluo=c(15908.54, 16246.92, 16722.43, 17104.29, 16794.93, 17031.44, 17299.05, 17185.49, 17362.28, 17127.43, 17368.96, 17513.19, 17593.95, 17626.37, 18308.29, 18768.12, 19955.26, 22493.85, 27088.12, 36539.44, 53694.18, 84663.18, 138835.64, 223331.89, 336409.69, 457729.88, 561233.12, 637536.31, 688985.88, 723158.56, 746575.62, 766274.75, 776087.75, 785144.81, 791573.81) ) # estimate starting values S = sd(df$Fluo[1:3])*10 for (j in 4:nrow(df)) { x = df$Fluo[1:j] s = sd(x) q = df$Fluo[j] cat("SD = ", S, "Fluo = ", q, "\\n") if(q<S) { break } else { S = sd(df$Fluo[1:j])*10 } } MIN_FLUO = q MIN_IDX = which (df$Fluo == MIN_FLUO) MIN_CYC = df$Cycle[MIN_IDX] MAX_FLUO = max(df$Fluo) MAX_IDX = which (df$Fluo == MAX_FLUO) MAX_CYC = df$Cycle[MAX_IDX] HLF_CYC = MIN_CYC + (MAX_CYC - MIN_CYC)/2 q_mod = lm(Fluo~Cycle, df[MIN_IDX:MAX_IDX,]) SLP = q_mod$coefficients[2] # plot plot(Fluo~Cycle, df) points(MIN_FLUO~MIN_CYC, col="blue", pch=16) points(MAX_FLUO~MAX_CYC, col="red", pch=16) abline(h=MAX_FLUO, col="red") abline(q_mod, col="green") abline(v=HLF_CYC, col="gold") abline(h=S, col="purple") legend("topleft", legend=c("HalfFluoCycle", "MaxFluo", "Slope", "Noise"), lty=1, col=c("gold", "red", "green", "purple"), title="Estimated parameters", lwd=2) legend("left", legend=c("Min", "Max"), title="Selected interval", pch=16, col=c("blue", "red")) # regression mod1 = nls(Fluo ~ (MaxFluo / (1+ exp(-(Cycle - HalfFluoCycle)/Slope)) + Noise), data=df, start=list(MaxFluo=MAX_FLUO, HalfFluoCycle=HLF_CYC, Slope=SLP, Noise=S)) mod2 = nls(Fluo ~ (MaxFluo / (1+ exp(-(Cycle - HalfFluoCycle)/Slope)) + Noise), data=df, start=list(MaxFluo=MAX_FLUO, HalfFluoCycle=HLF_CYC, Slope=2, Noise=S)) lines(df$Cycle, predict(mod2), col = "orange", lwd = 2)`
> Model 1 failed probably because SLP = 57724.29 , which is a weird
> slope value; model 2 instead worked, using your suggested value of 2.
> If I could hard-code the slope to 2, unless there is a better way to
> determine such a slope, then I think I could try to extend the
> approach to other data...
> Best regards
> Luigi
> 

> On Wed, Apr 23, 2025 at 6:43 PM Gregg Powell g.a.powell using protonmail.com wrote:
> 

> > Hello Luigi,
> > 

> > Great follow-up — Looks like you’re on the right track using nls() for nonlinear regression. You're fitting a logistic-like sigmoidal model (as in Rutledge’s paper), I think both errors you’re encountering are signs of bad starting values or a maybe model formulation issue.
> > 

> > Breakdown of Your Model Attempt
> > Your model formula:
> > 

> > > Fluo ~ (MaxFluo / (1 + exp(-(Cycle - (HalfFluoCycle / Slope)))) + 1)
> > 

> > This expression has perhaps two issues:
> > 

> > 1. Parameter placement: In the paper, the term inside exp() should look like -(Cycle - HalfFluoCycle)/Slope — not Cycle - (HalfFluoCycle / Slope).
> > 

> > 2. +1 additive constant: You called it backgroundSignal, but it’s hard-coded as +1. That constant should be a parameter (e.g., Bkg), otherwise the model fit has no way to adjust it.
> > 

> > Alternative Model Form:
> > Here’s a likely corrected version of the model based on Rutledge’s sigmoidal fit:
> > 

> > > mod1 <- nls(Fluo ~ MaxFluo / (1 + exp(-(Cycle - HalfFluoCycle)/Slope)) > + Bkg,
> > > data = df,
> > > start = list(MaxFluo = max(df$Fluo),
> > > HalfFluoCycle = 20,
> > > Slope = 2,
> > > Bkg = min(df$Fluo)))
> > 

> > Notes on nls() Convergence
> > nls() is very sensitive to starting values. Always inspect your plot and roughly estimate parameters:
> > 

> > MaxFluo: Use max(Fluo)
> > 

> > Bkg: Use min(Fluo)
> > 

> > HalfFluoCycle: Approximate where the curve inflects
> > 

> > Slope: Try values like 1, 2, 5 depending on steepness
> > 

> > You can also add trace=TRUE to watch convergence progress:
> > 

> > > mod1 <- nls(..., trace=TRUE)
> > 

> > About SSmicmen
> > From what I've gathered - this is meant for fitting the Michaelis-Menten model (enzyme kinetics), not sigmoidal growth. That's why mod2 fails — wrong self-starting model function.
> > 

> > Visualizing the Fit
> > After fitting:
> > 

> > > plot(Fluo ~ Cycle, data = df)
> > > lines(df$Cycle, predict(mod1), col = "red", lwd = 2)
> > 

> > You might also want to compute the inverse (e.g., solving for Cycle given a fluorescence value) — that would involve solving the nonlinear equation manually or using uniroot().
> > 

> > All the best,
> > gregg
> > 

> > On Wednesday, April 23rd, 2025 at 8:23 AM, Luigi Marongiu marongiu.luigi using gmail.com wrote:
> > 

> > > Further on this, I have found a formula from a paper from Rutledge
> > > (DOI: 10.1093/nar/gnh177), which I rendered as
> > > `MaxFluo / (1+ exp(-(Cycle-(HalfFluoCycle/Slope)))) + backgroundSignal`
> > > I then see that one can use `nls` to fit non-linear regression, so I tried:
> > > `df <- data.frame(Cycle=1:35, Fluo=c(15908.54, 16246.92, 16722.43, 17104.29, 16794.93, 17031.44, 17299.05, 17185.49, 17362.28, 17127.43, 17368.96, 17513.19, 17593.95, 17626.37, 18308.29, 18768.12, 19955.26, 22493.85, 27088.12, 36539.44, 53694.18, 84663.18, 138835.64, 223331.89, 336409.69, 457729.88, 561233.12, 637536.31, 688985.88, 723158.56, 746575.62, 766274.75, 776087.75, 785144.81, 791573.81) ) plot(Fluo~Cycle, df) mod1 = nls(Fluo~(MaxFluo / (1+ exp(-(Cycle-(HalfFluoCycle/Slope)))) + 1), data=df, start=list(MaxFluo=max(df$Fluo), HalfFluoCycle=15, Slope=0.1)) mod2 = nls(Fluo~SSmicmen(Fluo, Cycle), data=df)`
> > > but I got errors in both cases:
> > > ```
> > 

> > > > mod1
> > 

> > > Error in nlsModel(formula, mf, start, wts, scaleOffset = scOff,
> > > nDcentral = nDcntr) :
> > > singular gradient matrix at initial parameter estimates
> > 

> > > > mod2
> > 

> > > Error in qr.qty(QR.rhs, .swts * ddot(attr(rhs, "gradient"), lin)) :
> > > NA/NaN/Inf in foreign function call (arg 5)
> > > ```
> > > How can I properly set this regression model?
> > > Thank you
> > 

> > > On Wed, Apr 16, 2025 at 7:08 AM Luigi Marongiu marongiu.luigi using gmail.com wrote:
> > 

> > > > Thank you. This topic is more complicated than anticipated. Best regards, Luigi
> > 

> > > > On Tue, Apr 15, 2025 at 11:09 PM Andrew Robinson apro using unimelb.edu.au wrote:
> > 

> > > > > A statistical (off-topic!) point to consider: when the GLM was fitted, you conditioned on x and let y be the random variable. Therefore the model supports predictions of y conditional on x. You’re now seeking to make predictions of x conditional on y. That’s not the same thing, even in OLS.
> > 

> > > > > It might not matter for your application but it’s probably worth thinking about. Simulation experiments might shed some light on that.
> > 

> > > > > Cheers, Andrew
> > 

> > > > > --
> > > > > Andrew Robinson
> > > > > Chief Executive Officer, CEBRA and Professor of Biosecurity,
> > > > > School/s of BioSciences and Mathematics & Statistics
> > > > > University of Melbourne, VIC 3010 Australia
> > > > > Tel: (+61) 0403 138 955
> > > > > Email: apro using unimelb.edu.au
> > > > > Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/
> > 

> > > > > I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
> > 

> > > > > On 16 Apr 2025 at 1:01 AM +1000, Gregg Powell via R-help r-help using r-project.org, wrote:
> > 

> > > > > Take a look at this Luigi...
> > 

> > > > > # The model is: logit(p) = β₀ + β₁*Cycles
> > > > > # Where p is the probability (or normalized value in your case)
> > 

> > > > > # The inverse function would be: Cycles = (logit⁻¹(p) - β₀)/β₁
> > > > > # Where logit⁻¹ is the inverse logit function (also called the expit >function)
> > 

> > > > > # Extract coefficients from your model
> > > > > intercept <- coef(b_model)[1]
> > > > > slope <- coef(b_model)[2]
> > 

> > > > > # Define the inverse function
> > > > > inverse_predict <- function(p) {
> > > > > # p is the probability or normalized value you want to find the >cycles for
> > > > > # Inverse logit: log(p/(1-p)) which is the logit function
> > > > > logit_p <- log(p/(1-p))
> > 

> > > > > # Solve for Cycles: (logit(p) - intercept)/slope
> > > > > cycles <- (logit_p - intercept)/slope
> > 

> > > > > return(cycles)
> > > > > }
> > 

> > > > > # Example: What cycle would give a normalized value of 0.5?
> > > > > inverse_predict(0.5)
> > 

> > > > > This function takes a probability (normalized value between 0 and 1) and returns the cycle value that would produce this probability according to your model.
> > > > > Also:
> > > > > This works because GLM with binomial family uses the logit link function by default
> > > > > The inverse function will return values outside your original data range if needed
> > > > > This won't work for p=0 or p=1 exactly (you'd get -Inf or Inf), so you might want to add checks
> > 

> > > > > All the best,
> > > > > Gregg
> > 

> > > > > On Tuesday, April 15th, 2025 at 5:57 AM, Luigi Marongiu marongiu.luigi using gmail.com wrote:
> > 

> > > > > I have fitted a glm model to some data; how can I find the inverse
> > > > > function of this model? Since I don't know the y=f(x) implemented by
> > > > > glm (this works under the hood), I can't define a f⁻¹(y).
> > > > > Is there an R function that can find the inverse of a glm model?
> > > > > Thank you.
> > 

> > > > > The working example is:
> > > > > `V = c(120.64, 66.14, 34.87, 27.11, 8.87, -5.8, 4.52, -7.16, -17.39, -14.29, -20.26, -14.99, -21.05, -20.64, -8.03, -21.56, -1.28, 15.01, 75.26, 191.76, 455.09, 985.96, 1825.59, 2908.08, 3993.18, 5059.94, 6071.93, 6986.32, 7796.01, 8502.25, 9111.46, 9638.01, 10077.19, 10452.02, 10751.81, 11017.49, 11240.37, 11427.47, 11570.07, 11684.96, 11781.77, 11863.35, 11927.44, 11980.81, 12021.88, 12058.35, 12100.63, 12133.57, 12148.89, 12137.09) df = data.frame(Cycles = 1:35, Values = V[1:35]) M = max(df$Values) df$Norm = df$Values/M df$Norm[df$Norm<0] = 0 b_model = glm(Norm ~ Cycles, data=df, family=binomial) plot(Norm ~ Cycles, df, main="Normalized view", xlab=expression(bold("Time")), ylab=expression(bold("Signal (normalized)")), type="p", col="cyan") lines(b_model$fitted.values ~ df$Cycles, col="blue", lwd=2)`
> > 

> > > > > ______________________________________________
> > > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > 

> > > > --
> > > > Best regards,
> > > > Luigi
> > 

> > > --
> > > Best regards,
> > > Luigi
> 

> 

> 

> 

> --
> Best regards,
> Luigi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250425/38fb3369/attachment.sig>