[R] Non-linear curve fitting (nls): starting point and quality of fit

Mon Jun 4 21:31:13 CEST 2012

Nerak <nerak.t <at> hotmail.com> writes:

> 
> Hi all,
> 
> Like a lot of people I noticed that I get different results when I use nls
> in R compared to the exponential fit in excel. A bit annoying because often
> the R^2 is higher in excel but when I'm reading the different topics on this
> forum I kind of understand that using R is better than excel?
> 
>  (I don't really understand how the difference occurs, but I understand that
> there is a different way in fitting, in excel a single value can make the
> difference, in R it looks at the whole function? I read this: "Fitting a
> function is an approximation, trying to find a minimum. Think of frozen
> mountain lake surrounded by mountains. Excel's Solver will report the
> highest tip of the snowflake on the lake, if it finds it. nls will find out
> that the lake is essentially flat compare to the surrounding and tell you
> this fact in unkind word." )

  Snarky, but I like it.
  Two alternatives to nls are (1) Gabor Grothendieck's nls2 package:

   nls2 is an R package that adds the "brute-force" algorithm and
   multiple starting values to the R nls function. nls2 is free
   software licensed under the GPL and available from CRAN. It
   provides a function, nls2, which is a superset of the R nls
   function which it, in turn, calls.

Or John Nash's nlmrt package https://r-forge.r-project.org/R/?group_id=395 :

    	nlmrt provides tools for working with nonlinear least squares
    	problems using a calling structure similar to, but much
    	simpler than, that of the nls() function. Moreover, where
    	nls() specifically does NOT deal with small or zero residual
    	problems, nlmrt is quite happy to solve them. It also attempts
    	to be more robust in finding solutions, thereby avoiding
    	singular gradient messages that arise in the Gauss-Newton
    	method within nls(). The Marquardt-Nash approach in nlmrt
    	generally works more reliably to get a solution, though this
    	may be one of a set of possibilities, and may also be
    	statistically unsatisfactory.

> I have several questions about nls:
> 
> 1. The nls method doesn't give an R^2. But I want to determine the quality
> of the fit. To understand how to use nls I read "Technical note: Curve
> fitting with the R environment for Statistical Computing". In that document
> they suggested this to calculate R^2:
> 
> RSS.p<-sum(residuals(fit)^2)
>  TSS<-sum((y-mean(y))^2)
>  r.squared<-1-(RSS.p/TSS)
>  LIST.rsq<-r.squared
> 
> (with fit my results of the nls: formula y ~ exp.f(x, a, b) : y :
> a*exp(-b*x))
> 
> While I was reading on the internet to find a possible reason why I get
> different results using R and excel, I also read lots of different things
> about the "R^2 problem" in nls.
> 
> Is the method I'm using now ok, or should someone suggest to use something
> else?

  You could use the residual sum of squares as the quality of the fit:
(i.e. RSS.p above).  If you want a _unitless_ metric of the quality
of the fit, I'm not sure what you should do.

> 2. Another question I have is like a lot of people about the singular
> gradient problem. I didn't know the best way to chose my starting values for
> my coefficients. when it was too low, I got this singular gradient error.
> Raising the value helped me to get rid of that error. Changing that value
> didn't change my coefficients nor R^2. I was wondering if that's ok, just to
> raise the starting value of one of my coefficients? 

 [snip]

  If you can find a set of starting coefficients that give you
a sensible fit to the data without any convergence warnings, you
shouldn't worry that other sets of starting coefficients that
*don't* work also exist.