[R] Finding starting values for the parameters using nls() or nls2()

Mon Oct 24 19:49:30 CEST 2016

I've spent quite a bit of time trying to convince people on various 
lists that the solution to these kinds of
problems lies in the stable parameterization of the model.  I write the 
solutions in AD Model Builder because it
is easy.  But R people are generally stuck in R (or mired) so the 
message somehow always gets lost.

  I thought I would bite the bullet and figure out how to do this in R. 
It turns out that one can fit this model
  with only 6 function evaluations using nls2.   I apologize in advance 
for my execrable R code. But it does do
the job.

The data were fit using 3 calls to nls2. For the first call only the 
parameter th was estimated. This converged
in 4 function evaluations. For the second call all three of the 
parameters were estimated
(but the other 2 parameters are different from b0, b1) .This converged 
to the solution in four function evaluations.

The final call to nls2 uses the converged values to calculate b0,b1 and 
theta and starts from there.
As you can see the model is already converged. This final call to nls2 
is used to get the standard error
estimates for the parameters.

 >  nls.m1
Nonlinear regression model
   model: Retention ~ expFct1(Area, y1, yn, th)
    data: structure(list(Area = c(521.5, 689.78, 1284.71, 2018.8, 
2560.46, 524.91, 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 
1432.5, 2144.74, 2629.2), Retention = c(95.3, 87.18, 44.94, 26.36, 
18.12, 84.68, 37.24, 33.04, 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 
15.26)), .Names = c("Area", "Retention"), row.names = c(NA, -15L), class 
= "data.frame")
     th
0.9862
  residual sum-of-squares: 730

Number of iterations to convergence: 2
Achieved convergence tolerance: 1.616e-06

 >  nls.m2
Nonlinear regression model
   model: Retention ~ expFct2(Area, y1, yn, c, d, th)
    data: structure(list(Area = c(521.5, 689.78, 1284.71, 2018.8, 
2560.46, 524.91, 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 
1432.5, 2144.74, 2629.2), Retention = c(95.3, 87.18, 44.94, 26.36, 
18.12, 84.68, 37.24, 33.04, 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 
15.26)), .Names = c("Area", "Retention"), row.names = c(NA, -15L), class 
= "data.frame")
      c      d     th
0.9716 1.1010 0.9118
  residual sum-of-squares: 686.8

Number of iterations to convergence: 4
Achieved convergence tolerance: 2.398e-06

 >  nls.m
Nonlinear regression model
   model: Retention ~ expFct(Area, b0, b1, th)
    data: structure(list(Area = c(521.5, 689.78, 1284.71, 2018.8, 
2560.46, 524.91, 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 
1432.5, 2144.74, 2629.2), Retention = c(95.3, 87.18, 44.94, 26.36, 
18.12, 84.68, 37.24, 33.04, 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 
15.26)), .Names = c("Area", "Retention"), row.names = c(NA, -15L), class 
= "data.frame")
         b0         b1         th
  5.2104452 -0.0004672  0.9118040
  residual sum-of-squares: 686.8

Number of iterations to convergence: 0
Achieved convergence tolerance: 1.384e-06

Parameters:
      Estimate Std. Error t value Pr(>|t|)
b0  5.2104452  0.4999594  10.422 2.29e-07 ***
b1 -0.0004672  0.0013527  -0.345   0.7358
th  0.9118040  0.3583575   2.544   0.0257 *

So what is the stable parameterization for this model.  To simplify let 
x be the independent variable and y be the
  dependent variable and write t instead of th  So the model is

             y = exp(b0*exp(b1*x^t) (1)

Now it would be nice to reorder the x's so that they are monotone 
increasing or decreasing, but in this case the
first x is almost the largest and the last x is almost the smallest 
(slight reach) so they will do. Call them x1 and xn.
The new parameters of the model are the predicted y's for x1 and xn.  
Call them y1 and yn.  Note that y1 and yn
are parameters, not observations.

The data were fit using 3 calls to nls2. For the first call only the 
parameter th was estimated. this converged
in 4 function evaluestions. for the second call all three of the 
parameters were estimated
(but the other 2 parameters are different from b0, b1) .This converged 
to the solution in four function evaluations.

The final call to nls2 uses the converged values to calculate b0,b1 and 
theta and starts from there.
as you can see the model is already converged. this final call to nls2 
is used to get the standard error
estimates for the parameters.

 >  nls.m1
Nonlinear regression model
   model: Retention ~ expFct1(Area, y1, yn, th)
    data: structure(list(Area = c(521.5, 689.78, 1284.71, 2018.8, 
2560.46, 524.91, 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 
1432.5, 2144.74, 2629.2), Retention = c(95.3, 87.18, 44.94, 26.36, 
18.12, 84.68, 37.24, 33.04, 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 
15.26)), .Names = c("Area", "Retention"), row.names = c(NA, -15L), class 
= "data.frame")
     th
0.9862
  residual sum-of-squares: 730

Number of iterations to convergence: 2
Achieved convergence tolerance: 1.616e-06

 >  nls.m2
Nonlinear regression model
   model: Retention ~ expFct2(Area, y1, yn, c, d, th)
    data: structure(list(Area = c(521.5, 689.78, 1284.71, 2018.8, 
2560.46, 524.91, 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 
1432.5, 2144.74, 2629.2), Retention = c(95.3, 87.18, 44.94, 26.36, 
18.12, 84.68, 37.24, 33.04, 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 
15.26)), .Names = c("Area", "Retention"), row.names = c(NA, -15L), class 
= "data.frame")
      c      d     th
0.9716 1.1010 0.9118
  residual sum-of-squares: 686.8

Number of iterations to convergence: 4
Achieved convergence tolerance: 2.398e-06

 >  nls.m
Nonlinear regression model
   model: Retention ~ expFct(Area, b0, b1, th)
    data: structure(list(Area = c(521.5, 689.78, 1284.71, 2018.8, 
2560.46, 524.91, 989.05, 1646.32, 2239.65, 2972.96, 478.54, 875.52, 
1432.5, 2144.74, 2629.2), Retention = c(95.3, 87.18, 44.94, 26.36, 
18.12, 84.68, 37.24, 33.04, 23.46, 9.72, 97.92, 71.44, 44.52, 24.44, 
15.26)), .Names = c("Area", "Retention"), row.names = c(NA, -15L), class 
= "data.frame")
         b0         b1         th
  5.2104452 -0.0004672  0.9118040
  residual sum-of-squares: 686.8

Number of iterations to convergence: 0
Achieved convergence tolerance: 1.384e-06

Parameters:
      Estimate Std. Error t value Pr(>|t|)
b0  5.2104452  0.4999594  10.422 2.29e-07 ***
b1 -0.0004672  0.0013527  -0.345   0.7358
th  0.9118040  0.3583575   2.544   0.0257 *

So what is the stable parameterization for this model.  To simplify letx 
be the independent variable and y be the
  dependent variable and write t insted of th  So the model is

             y = exp(b0*exp(b1*x^t) (1)

Now it would be nice to reorder the x's so that they are monotone 
increasing or decreasing, but in this case the
first x is almost the largest and the last x is almost the smallest 
(slight reach) so they will do. Call them x1 and xn.
The new parameters of the model are the predicted y's for x1 and xn.  
Call them y1 and yn.  Note that y1 and yn
are parameters, not observations.

              y1 = exp(b0*exp(b1*x1^t)
(2)
              yn = exp(b0*exp(b1*xn^t)

One can solve for b1 and b0 in terms of y1 and yn.

             b1=log(log(y1)/log(yn))/(x1^t-xn^t);
(3)
             b0=log(y1)*exp(-b1*x1^t);

To use this  we do the fitting of the model in two stages. In the first 
stage we use the obvious
  estimates of y[1] for y1 and y[n] for yn and fix these values and 
estimate t using the obvious
starting value of 1 for t.

In the second stage we set

                y1=c*y[1]

                yn=d*y[n]

and estimate   c, d, and t using the obvious starting values of 1 for c 
and d.
That's it. this converges as stated in 6 function evaluations.  This 
technique works for
may curves such as 3,4,5 parameter logistic double exponential 
vonBertalanffy
(where Schnute and I first discovered it in the early 1980's before 
anyone was born).

One caveat  is that usually not all values of y1 and yn are 
permissible.  For example in this case if b>0
then y1 and yn are both >1. To make this routine bulletproof one needs 
to impose these kinds of
constraints.  But this was not needed here.

Here is my  R code for this model.

library("nls2")
cl=data.frame(Area=c(521.5, 689.78, 1284.71, 2018.8, 2560.46, 524.91, 
989.05,
1646.32, 2239.65, 2972.96, 478.54, 875.52, 1432.5, 2144.74, 2629.2), 
Retention=c (95.3, 87.18, 44.94, 26.36, 18.12, 84.68, 37.24, 33.04, 23.46,
9.72, 97.92, 71.44, 44.52, 24.44, 15.26) )

  y1 <<- cl$Retention[1];
  yn <<- cl$Retention[length(cl$Retention)]

  expFct1 <- function(x, y1, yn,th)
  {
   x1=x[1]
   xn=x[length(x)]
   b1=log(log(y1)/log(yn))/(x1^th-xn^th)
   b0=log(y1)*exp(-b1*x1^th)
   exp(b0*exp(b1*x^th));
  }

  expFct2 <- function(x, y_1,y_n,c,d,th)
  {
   x1=x[1]
   xn=x[length(x)]
   y1=c*y_1
   yn=d*y_n
   b1=log(log(y1)/log(yn))/(x1^th-xn^th)
   b0=log(y1)*exp(-b1*x1^th)
   exp(b0*exp(b1*x^th));
  }

  expFct <- function(x, b0, b1,th)
  {
    exp(b0*exp(b1*x^th))
  }

  nls.m1<- nls2(Retention ~ expFct1(Area, y1, yn, th), data = cl,
     start = list(th = 1))

  th_start=coef(nls.m1)

  nls.m2<- nls2(Retention ~ expFct2(Area, y1, yn, c, d,  th), data = cl,
     start = list(c=1, d=1, th = th_start))

  x=cl$Area
  x1=x[1]
  xn=x[length(x)]
  th_start=coef(nls.m2)[3]
  cy1=coef(nls.m2)[1]*y1

dyn=coef(nls.m2)[2]*yn
  b1_start=log(log(cy1)/log(dyn))/(x1^th_start-xn^th_start)
  b0_start=log(cy1)*exp(-b1_start*x1^th_start)

  th_start
  b1_start
  b0_start
  nls.m<- nls2(Retention ~ expFct(Area, b0, b1, th), data = cl,
     start = list(b0 = b0_start, b1 = b1_start, th = th_start))

  nls.m1
  nls.m2
  nls.m