[R-sig-eco] Fitting logistic function to data that crosses the origin

Wed Jun 19 14:08:07 CEST 2013

Dear Phil,

I did not understand why you prefer working on logged data, if 
untransformed data are already nicely S-shaped. Is it for fitting the 
classical methods applied in all the previous studies?

Anyway, I thik the nls self-start SSfpl() might be usefull. It fits a 
four parameter sigmoid : inflection point, "slope", upper asymptote, and 
lower asymptote. Thanks to this fourth parameter, you could deal with 
negative y values without any rescaling.

Maybe SSasympOff() might be also of interest, it is an Asymptotic 
Regression Model with an Offset. And also all other self-start nlm:    
?selfStart()

HTH
Pierre

Le 18/06/2013 23:28, Novack-Gottshall, Philip M. a écrit :
> Greetings,
>
> I am looking for advice on fitting a logistic function to logged data that includes negative values.
>
> Background: The data is body-size data, which is conventionally analyzed in logarithmic form. Because the dependent variables cross the value of 1, the logged data includes both positive and negative values. I'm comparing the standard power function and the logistic function because I have reason to believe there is allometry, which approximately follows a logistic curve during ontogeny. And I want to use AICc model comparison in my model selection.
>
> Logistic functions, by definition, asymptotically approach (and never reach) zero, making it impossible to fit data that crosses the origin. But the data (see sample below), does have a sigmoidal appearance (both when plotted in linear and logged axes.) So I would like to fit a logistic model to it.
>
> I have no problem fitting a logistic model to the linear form. And I can shift the logged data upward by an arbitrary value so that all response values are positive. But because I want to use the log-likelihoods for the model fits to conduct model selection, I don't like this solution. This is because it's arbitrary, because the value of logLik() can change depending on how much I shift the dependent variables, and because I'd need to shift different samples by different amounts.
>
> Can anyone point me in a good direction on how to better fit logistic models to this data? (Or to calculate the correct logLik and residual sum-of-squares.) Should I just compare logistic in linear space to the power function in log-log space? Or how about rescaling the data between 0 and 1, which is arbitrary, but at least consistent among samples?
>
> Thanks,
> Phil
>
>
> # Basic example using sample data in linear-space
> y <- c(0.01, 0.10, 0.17, 0.69, 1.35, 3.45, 4.07, 4.12)
> x <- c(0.28, 0.64, 0.76, 1.43, 1.98, 2.99, 3.97, 4.43)
> data <- data.frame(x=x, y=y)
> plot(data)
> # Estimate starting parameters
> K <- max(data$y)      # Carrying capacity
> x.0 <- min(data$y)    # Starting size
> r <- 2                          # Rate
> nlsfit <- nls(y ~ K / (1 + ((K - x.0)/x.0) * exp(-r * x)), data=data, start=list(K=K, r=r, x.0=x.0))
> nlsfit
> logLik(nlsfit)
> xfit <- seq(min(data$x), max(data$x), (max(data$x)- min(data$x))/100)
> yfit <- coef(nlsfit)[1] / (1 + ((coef(nlsfit)[1] - coef(nlsfit)[3]) / coef(nlsfit)[3]) * exp(-coef(nlsfit)[2] * xfit))
> lines(spline(xfit, yfit), col="red")
> # K = 4.185, r = 2.117, x.0 = 0.0325
> # residual sum-of-squares: 0.0228
> # logLik = 12.094
>
> # Now re-do, taking the log of both variables:
> data <- data.frame(x=log(x), y=log(y))
> plot(data)
> K <- max(data$y) ; x.0 <- min(data$y); r <- .7
> nlsfit <- nls(y ~ K / (1 + ((K - x.0)/x.0) * exp(-r * x)), data=data, start=list(K=K, r=r, x.0=x.0))
> nlsfit
> logLik(nlsfit)
> xfit <- seq(min(data$x), max(data$x), (max(data$x)- min(data$x))/100)
> yfit <- coef(nlsfit)[1] / (1 + ((coef(nlsfit)[1] - coef(nlsfit)[3]) / coef(nlsfit)[3]) * exp(-coef(nlsfit)[2] * xfit))
> lines(spline(xfit, yfit), col="red")
> # Line reaches asymptote as y approaches zero. (This is also how the logistic function works.)
> # K = -4.738, r = -4.066, x.0 = -0.694
> # residual sum-of-squares: 5.741
> # logLik = -10.024
>
> # Now re-do, taking the log of both variables, then vertically shift by adding value of 5 to y variable to ensure all positive values:
> data <- data.frame(x=log(x), y=log(y) + 5)
> plot(data)
> K <- max(data$y) ; x.0 <- min(data$y) ; r <- 2
> nlsfit <- nls(y ~ K / (1 + ((K - x.0)/x.0) * exp(-r * x)), data=data, start=list(K=K, r=r, x.0=x.0))
> nlsfit
> logLik(nlsfit)
> xfit <- seq(min(data$x), max(data$x), (max(data$x)- min(data$x))/100)
> yfit <- coef(nlsfit)[1] / (1 + ((coef(nlsfit)[1] - coef(nlsfit)[3]) / coef(nlsfit)[3]) * exp(-coef(nlsfit)[2] * xfit))
> lines(spline(xfit, yfit), col="red")
> # K = 6.727, r = 1.810, x.0 = 3.843
> # residual sum-of-squares: 0.354
> # logLik =  1.120 # But this and residual sum-of-squares varies depending on what value data were shifted by.
>
> # Now take logged data and use vertically shifted model and down-shift it:
> data <- data.frame(x=log(x), y=log(y))
> plot(data)
> xfit <- seq(min(data$x), max(data$x), (max(data$x)- min(data$x))/100)
> yfit <- coef(nlsfit)[1] / (1 + ((coef(nlsfit)[1] - coef(nlsfit)[3]) / coef(nlsfit)[3]) * exp(-coef(nlsfit)[2] * xfit)) - 5
> lines(spline(xfit, yfit), col="red")
> # The model now fits well (visually), but I don't trust the logLik and residual values given the arbitrary vertical shift.
>
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   Phil Novack-Gottshall
>   Assistant Professor
>   Department of Biological Sciences
>   Benedictine University
>   5700 College Road
>   Lisle, IL 60532
>
>   pnovack-gottshall at ben.edu<mailto:pnovack-gottshall at ben.edu>
>   Phone: 630-829-6514
>   Fax: 630-829-6547
>   Office: 332 Birck Hall
>   Lab: 107 Birck Hall
>   http://www1.ben.edu/faculty/pnovack-gottshall
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology