[R] naive "collinear" weighted linear regression

Sat Nov 14 19:50:11 CET 2009

David Winsemius <dwinsemius <at> comcast.net> writes:

> 
> Which means those x, y, and "error" figures did not come from an  
> experiment, but rather from theory???
> 

The fact is I am trying to compare the results of:
(1) lm under R and 
(2) the Java applet at http://omnis.if.ufrj.br/~carlos/applets/reta/reta.html 
(3) the Fit method of the ROOT system used by CERN,
(4) the Numerical Recipes functions for weighted linear regression

The three last ones all provide, for the "fake" data set I furnished in my first
post in this thread, the same results; particularly they give erros or
uncertainties in the estimated coefficients of intercept and slope which, as
seems intuitive, are not zero at all, but of the order 0.1 or 0.2, whereas the
method lm under R issues a "Std. Error", which is zero. Independently of
terminology, which sure is of utmost importance, the data I provided should give
rise to a best fit straight line with intercept zero and slope 2, but also with
non-vanishing errors associated with them. How do I get this from lm????

I only want, for instance, calculation of the so-called covariance matrix for
the estimated coefficients, as given, for instance, in Equation (2.3.2) of the
second edition of Draper and Smith, "Applied regression analysis"; this is a
standard statistical result, right? So why does R not directly provide it as a
summary from an lm object???

> >
> > Of course the best fit coefficients should be 0 for the intercept  
> > and 2 for the slope. Furthermore, it seems completely plausible (or  
> > not?) that, since the y_i have associated non-vanishing  
> > ``errors'' (dispersions), there should be corresponding non- 
> > vanishing ``errors'' associated to the best fit coefficients, right?
> >
> > When I try:
> >
> > > fit_mod <- lm(y~x,weights=1/error^2)
> >
> > I get
> >
> > Warning message:
> > In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
> >  extra arguments weigths are just disregarded.
> 
>   (Actually the weights are for adjusting for sampling, and I do not  
> see any sampling in your "design".)
> 
> >
> > Keeping on, despite the warning message, which I did not quite  
> > understand, when I type:
> >
> > > summary(fit_mod)
> >
> > I get
> >
> > Call:
> > lm(formula = y ~ x, weigths = 1/error^2)
> >
> > Residuals:
> >         1          2          3          4
> > -5.067e-17  8.445e-17 -1.689e-17 -1.689e-17
> >
> > Coefficients:
> >             Estimate Std. Error   t value Pr(>|t|)
> > (Intercept) 0.000e+00  8.776e-17 0.000e+00        1
> > x           2.000e+00  3.205e-17 6.241e+16   <2e-16 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 7.166e-17 on 2 degrees of freedom
> > Multiple R-squared:     1,      Adjusted R-squared:     1
> > F-statistic: 3.895e+33 on 1 and 2 DF,  p-value: < 2.2e-16
> >
> >
> > Naively, should not the column Std. Error be different from zero??  
> > What I have in mind, and sure is not what Std. Error means, is that  
> > if I carried out a large simulation, assuming each response y_i a  
> > Gaussian random variable with mean y_i and standard deviation  
> > 2*error=0.6, and then making an ordinary least squares fitting of  
> > the slope and intercept, I would end up with a mean for these  
> > simulated coefficients which should be 2 and 0, respectively,
> 
> Well, not precisely 2 and 0, but rather something very close ... i.e,  
> within "experimental error". Please note that numbers in the range of  
> 10e-17 are effectively zero from a numerical analysis perspective.
> 
>
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
> 
>  > .Machine$double.eps ^ 0.5
> [1] 1.490116e-08

I know this all too well and it is obviously a trivial supernewbie issue, which
I have already overcome a long time ago...

> 
> > and, that's the point, a non-vanishing standard deviation for these  
> > fitted coefficients, right?? This somehow is what I expected should  
> > be an estimate or, at least, a good indicator, of the degree of  
> > uncertainty which I should assign to the fitted coefficients; it  
> > seems to me these deviations, thus calculated as a result of the  
> > simulation, will certainly not be zero (or  3e-17, for that matter).  
> > So this Std. Error does not provide what I, naively, think should be  
> > given as a measure of the uncertainties or errors in the fitted  
> > coefficients...
> 
> You are trying to impose an error structure on a data situation that  
> you constructed artificially to be perfect.
> 
> >
> > What am I not getting right??
> 
> That if you input "perfection" into R's linear regression program, you  
> get appropriate warnings?
> 
> >
> > Thanks and sorry for the naive and non-expert question!
> 
> You are a Professor of physics, right? You do experiments, right? You  
> replicate them.  S0 perhaps I'm the one who should be puzzled.

Unfortunately you eschewed answering objectively any of my questions; I insist
they do make sense. Don't mention the data are perfect; this does not help to
make any progress in understanding the choice of convenient summary info the lm
method provides, as compared to what, in my humble opinion and in this specific
particular case, it should provide: the covariance matrix of the estimated
coefficients...

> 
> > -- 
> > #######################################
> > Prof. Mauricio Ortiz Calvao
> > Federal University of Rio de Janeiro
> > Institute of Physics, P O Box 68528
> > CEP 21941-972 Rio de Janeiro, RJ
> > Brazil