[R] nls start values

Gabor Grothendieck ggrothendieck at gmail.com
Tue Dec 13 18:03:57 CET 2011


On Tue, Dec 13, 2011 at 10:53 AM, Hans W Borchers
<hwborchers at googlemail.com> wrote:
> Niklaus Fankhauser <niklaus.fankhauser <at> cell.biol.ethz.ch> writes:
>
>> I'm using nls to fit periodic gene-expression data to sine waves. I need
>> to set the upper and lower boundaries, because I do not want any
>> negative phase and amplitude solutions. This means that I have to use
>> the "port" algorithm. The problem is, that depending on what start value
>> I choose for phase, the fit works for some cases, but not for others.
>> In the example below, the fit works using phase=pi,  but not using
>> phase=0. But there are many examples which fit just fine using 0.
>>
>> Is there a comparable alternative to nls that is not so extremely
>> influenced by the start values?
>>
>
> Use package `nls2' to first search on a grid, and then apply `nls' again
> to identify the globally best point:
>
>    # Data for example fit
>    afreq <- 1 / (24 / 2 / pi)
>    tpoints <- c(0,0,0,2,2,2,4,4,4,6,6,6,8,8,8,12,12,12,
>                 14,14,14,16,16,16,18,18,18,20,20,20,24,24,24)
>    gene_expression <-
>    c(1.551383, 1.671742, 1.549499, 1.694480, 1.632436, 1.471568, 1.623381,
>      1.579361, 1.809394, 1.753223, 1.685918, 1.754968, 1.963069, 1.820690,
>      1.985159, 2.205064, 2.160308, 2.120189, 2.194758, 2.165993, 2.189981,
>      2.098671, 2.122207, 2.012621, 1.963610, 1.884184, 1.955160, 1.801175,
>      1.829686, 1.773260, 1.588768, 1.563774, 1.559192)
>    shift=mean(gene_expression) # y-axis (expression) shift
>
>    # Grid search
>    library("nls2")
>    frml <- gene_expression ~ sin(tpoints * afreq + phase) * amp + shift
>    startdf <- data.frame(phase=c(0, 2*pi), amp = c(0, 2))
>    nls2(frml, algorithm = "grid-search", start = startdf,
>               control = list(maxiter=200))
>
>    # Perfect fit
>    startvals <- list(phase = 4.4880, amp = 0.2857)
>    sine_nls <- nls(frml, start=startvals)
>    #  phase    amp
>    # 4.3964 0.2931
>    # residual sum-of-squares: 0.1378
>

Just one small point here.  If out is the result of the nls2 call
above then then coef(out) is the best set of parameter values among
those tested on the grid and:

   nls2(out, start = out)

is a quick way to run it again starting from the value found using grid search.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list