[R] Error "singular gradient matrix at initial parameter estimates" in nls

Bert Gunter gunter.berton at gene.com
Wed Mar 31 18:39:15 CEST 2010


Over parameterization/non-identifiability is not determined by the ratio of
the number of data values to the number of parameters: if you try to fit a
scatter of a zillion points that lie near a straight line to a model with
curvature -- the 4p logistic function, say -- you're over-parameterized.

I am not an expert on nonlinear optimization, but I would think that the
correlation matrix of the parameters would be one thing to check; another
would be the change in the fits when some of the model terms are dropped.
But model selection for nonlinear fitting is not a trivial issue, and those
with real expertise, not me, could be more helpful.

The fact that other algorithms converge while R's did not is also not
particularly informative, I believe. This could occur merely due to the
choice of convergence criteria (step size, choice of objective function,
etc.).  Or the convergence could be to a local minimum, not a global one.
Comparison of performance of optimization algorithms is, again, a cottage
industry for which serious expertise is required. (This should not be taken
as a defense of R, either. My optimization ignorance cuts both ways). 

So I do not think your protestations of innocence are necessarily accurate.
Nor, I agree,  are my accusations of guilt. I **still** believe that your
convergence difficulties are due to over-parameterization, and that this is
something that needs to be carefully investigated; but that is a prior, not
a posterior.  Examining lots of plots is probably a good place to begin.


Bert Gunter
Genentech Nonclinical Biostatistics


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Corrado
Sent: Wednesday, March 31, 2010 6:13 AM
Cc: r-help at r-project.org
Subject: Re: [R] Error "singular gradient matrix at initial parameter
estimates" in nls

Dear JN, Bert,

1) It is not a perfect fit. I do not think I have ever said that. I said 
that an external algorithms fits the model without any problems: with ~ 
500,000 data points and 19 paramters (ki in the original equation), it 
fits the model in less than 1 second. The data are not artificial data. 
The variables are independent (pi in the original model). The solution 
is unique and the rapidity of convergence is practically independent 
from the selection of start conditions (with a reasonable selection of 
start conditions at least). The resulting residuals are approximately 
normally distributed with mean 0 and sd ~ 4.23.

2) I agree with the comment of Bert on over-parametrization, but again 
the model is not overparamterised, and it is identifiable (in part 
answered already in (1))


Prof. John C Nash wrote:
> If you have a perfect fit, you have zero residuals. But in the nls 
> manual page we have:
>> Warning:
>>      *Do not use 'nls' on artificial "zero-residual" data.*
> So this is a case of complaining that your diesel car is broken 
> because you ignored the "Diesel fuel only" sign on the filler cap and 
> put in gasoline.
> However I've not been happy with this choice in the code of nls -- 
> it's been there a long time -- and my own codes from 1974 onwards have 
> always handled zero residual cases. I do believe that the code could 
> at least give a better diagnostic message. Zero residuals -- perfect 
> fits -- arise when one is interested more or less in an interpolating 
> function rather than doing statistics, and I can understand the 
> reluctance of statisticians to countenance such a use of nls.
> And Bert's comment on overparametrization is almost certainly valid also.
> JN

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct529 at york.ac.uk

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list