[R] different regression coeffs with different starting point
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Mon Mar 14 19:59:57 CET 2011
On Mon, 14 Mar 2011, Jen wrote:
> Hi Bill,
> Thanks for your response and I'm sorry -- that was a misleading example of
> what I was trying to show. This one should illustrate the point:
>
> require(AER)
> data_in = c(0,6,12,18,24,30,36,42,48,54,60,66,72,78)
> data_in2 = data_in^2
> data_in3 = data_in^3
> data_out =
> c(139487.00,133333.00,62500.00,58823.00,56338.00,27549.00,4244.00,600.00,112.00,95.00,48.00,31.00,15.00,14.99)
> ldata_out = log(data_out)
>
> m1 <- lm(ldata_out ~ data_in + data_in2+data_in3)
> cubic <- tobit(ldata_out ~ data_in + data_in2 + data_in3, left=
> log(15-0.01),init = coef(m1))
The scaling of your regressors is extremely bad. data_in3 is in the
hundreds of thousands. This computationally challenging, to put it mildly.
I would recommend to either scale data_in by some constant (here 10 or 100
is enough) or to use an orthogonal coding of the polynomial. Both can be
achieved easily using the poly() function:
- Your model would be: ldata_out ~ poly(data_in, 3, raw = TRUE)
- With scaled regressors: ldata_out ~ poly(data_in/100, 3, raw = TRUE)
- With orthogonal coding: ldata_out ~ poly(data_in/100, 3)
All models conceptually lead to the same fitted values and the same
maximal likelihood. Numerically, however, the first model only achieves
suboptimal values depending on the starting values. The latter two models
seem to yield more sensible results and be close to the actual maximum.
Note that the coefficients change of course due to the different codings
of the regressors. This needs to be taken into account if you want to
compare the slopes.
In summary, I would probably use
cubicz <- tobit(ldata_out ~ poly(data_in/100, 3, raw = TRUE),
left = log(15-0.01))
which is very robust to the choice of starting values.
hth,
Z
> print(cubic)
> fitted(cubic)
>
> n= length(data_in)
> m2 <- lm(ldata_out[1:(n-1)] ~ data_in[1:(n-1)] +
> data_in2[1:(n-1)]+data_in3[1:(n-1)])
> cubic2 <- tobit(ldata_out ~ data_in + data_in2 + data_in3, left=
> log(15-0.01),init = coef(m2))
> print(cubic2)
> fitted(cubic2)
>
> The models cubic and cubic2 seem to show that the intercept value in the
> model doesn't change from its starting value:
>
>> m1
>
> Call:
> lm(formula = ldata_out ~ data_in + data_in2 + data_in3)
>
> Coefficients:
> (Intercept) data_in data_in2 data_in3
> 1.156e+01 1.004e-01 -7.755e-03 6.464e-05
>
>> cubic
>
> Call:
> tobit(formula = ldata_out ~ data_in + data_in2 + data_in3, left = log(15 -
> 0.01), init = coef(m1))
>
> Coefficients:
> (Intercept) data_in data_in2 data_in3
> 11.5559540 0.0289083 -0.0040615 0.0000229
>
> Scale: 3.759
>
>> m2
>
> Call:
> lm(formula = ldata_out[1:(n - 1)] ~ data_in[1:(n - 1)] + data_in2[1:(n -
> 1)] + data_in3[1:(n - 1)])
>
> Coefficients:
> (Intercept) data_in[1:(n - 1)] data_in2[1:(n - 1)] data_in3[1:(n
> - 1)]
> 1.150e+01 1.151e-01 -8.381e-03
> 7.122e-05
>
>> cubic2
>
> Call:
> tobit(formula = ldata_out ~ data_in + data_in2 + data_in3, left = log(15 -
> 0.01), init = coef(m2))
>
> Coefficients:
> (Intercept) data_in data_in2 data_in3
> 1.150e+01 3.391e-02 -4.187e-03 2.383e-05
>
> Scale: 3.759
>
> Am I missing something here??
>
> --
> View this message in context: http://r.789695.n4.nabble.com/different-regression-coeffs-with-different-starting-point-tp3353536p3354276.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list