[R] nls: different results if applied to normal or linearized data

Thu Mar 6 11:06:26 CET 2008

On Thursday 06 March 2008 (07:03:34), Prof Brian Ripley wrote:
> The only thing you are adding to earlier replies is incorrect:
>
>  	fitting by least squares does not imply a normal distribution.
>

Thanks for the clarification, 'implies' is to strong. I should have 
written 'suggests' or 'is often motivated by'.

When I wrote my post, there was only one earlier reply, which mentioned
the objective function.  The point I (rather clumsily) tried to make is: (a) 
and (b)+(c) differ in so far that under (a) y may have zero or negative 
values, while under (b) and (c) y may only have values above zero. 
So the models do not just differ in the objective function but also
in their substantive interpretation, which may help deciding which
is the 'correct' coefficient b.

> For a regression model, least-squares is in various senses optimal when
> the errors are i.i.d. and normal, but it is a reasonable procedure for
> many other situations (but not for modestly long-tailed distributions,
> the point of robust statistics).
>
> Although values from -Inf to +Inf are theoretically possible for a normal,
> it has very little mass in the tails and is often used as a model for
> non-negative quantities (and e.g. the justification of Box-Cox estimation
> relies on this).
>
> On Wed, 5 Mar 2008, Martin Elff wrote:
> > On Wednesday 05 March 2008 (14:53:27), Wolfgang Waser wrote:
> >> Dear all,
> >>
> >> I did a non-linear least square model fit
> >>
> >> y ~ a * x^b
> >>
> >> (a) > nls(y ~ a * x^b, start=list(a=1,b=1))
> >>
> >> to obtain the coefficients a & b.
> >>
> >> I did the same with the linearized formula, including a linear model
> >>
> >> log(y) ~ log(a) + b * log(x)
> >>
> >> (b) > nls(log10(y) ~ log10(a) + b*log10(x), start=list(a=1,b=1))
> >> (c) > lm(log10(y) ~ log10(x))
> >>
> >> I expected coefficient b to be identical for all three cases. Hoever,
> >> using my dataset, coefficient b was:
> >> (a) 0.912
> >> (b) 0.9794
> >> (c) 0.9794
> >>
> >> Coefficient a also varied between option (a) and (b), 107.2 and 94.7,
> >> respectively.
> >
> > Models (a) and (b) entail different distributions of the dependent
> > variable y and different ranges of values that y may take.
> > (a) implies that y has, conditionally on x, a normal distribution and
> > has a range of feasible values from -Inf to +Inf.
> > (b) and (c) imply that log(y) has a normal distribution, that is,
> > y has a log-normal distribution and can take values from zero to +Inf.
> >
> >> Is this supposed to happen?
> >
> > Given the above considerations, different results with respect to the
> > intercept are definitely to be expected.
> >
> >> Which is the correct coefficient b?
> >
> > That depends - is y strictly non-negative or not ...
> >
> > Just my 20 cents...
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.

-- 
10.0 times 0.1 is hardly ever 1.0
 ---- Kernighan and Plauger

-------------------------------------------------
Dr. Martin Elff
Faculty of Social Sciences
LSPWIVS (van Deth)
University of Mannheim
A5, 6
68131 Mannheim
Germany

Phone: +49-621-181-2093
Fax: +49-621-181-2099
E-Mail: elff at sowi.uni-mannheim.de
Web: http://webrum.uni-mannheim.de/sowi/elff/
     http://www.sowi.uni-mannheim.de/lspwivs/