[R] naive "collinear" weighted linear regression
David Winsemius
dwinsemius at comcast.net
Thu Nov 12 04:06:13 CET 2009
On Nov 11, 2009, at 7:45 PM, Mauricio Calvao wrote:
> Hi there
>
> Sorry for what may be a naive or dumb question.
>
> I have the following data:
>
> > x <- c(1,2,3,4) # predictor vector
>
> > y <- c(2,4,6,8) # response vector. Notice that it is an exact,
> perfect straight line through the origin and slope equal to 2
>
> > error <- c(0.3,0.3,0.3,0.3) # I have (equal) ``errors'', for
> instance, in the measured responses
Which means those x, y, and "error" figures did not come from an
experiment, but rather from theory???
>
> Of course the best fit coefficients should be 0 for the intercept
> and 2 for the slope. Furthermore, it seems completely plausible (or
> not?) that, since the y_i have associated non-vanishing
> ``errors'' (dispersions), there should be corresponding non-
> vanishing ``errors'' associated to the best fit coefficients, right?
>
> When I try:
>
> > fit_mod <- lm(y~x,weights=1/error^2)
>
> I get
>
> Warning message:
> In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
> extra arguments weigths are just disregarded.
(Actually the weights are for adjusting for sampling, and I do not
see any sampling in your "design".)
>
> Keeping on, despite the warning message, which I did not quite
> understand, when I type:
>
> > summary(fit_mod)
>
> I get
>
> Call:
> lm(formula = y ~ x, weigths = 1/error^2)
>
> Residuals:
> 1 2 3 4
> -5.067e-17 8.445e-17 -1.689e-17 -1.689e-17
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 0.000e+00 8.776e-17 0.000e+00 1
> x 2.000e+00 3.205e-17 6.241e+16 <2e-16 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 7.166e-17 on 2 degrees of freedom
> Multiple R-squared: 1, Adjusted R-squared: 1
> F-statistic: 3.895e+33 on 1 and 2 DF, p-value: < 2.2e-16
>
>
> Naively, should not the column Std. Error be different from zero??
> What I have in mind, and sure is not what Std. Error means, is that
> if I carried out a large simulation, assuming each response y_i a
> Gaussian random variable with mean y_i and standard deviation
> 2*error=0.6, and then making an ordinary least squares fitting of
> the slope and intercept, I would end up with a mean for these
> simulated coefficients which should be 2 and 0, respectively,
Well, not precisely 2 and 0, but rather something very close ... i.e,
within "experimental error". Please note that numbers in the range of
10e-17 are effectively zero from a numerical analysis perspective.
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
> .Machine$double.eps ^ 0.5
[1] 1.490116e-08
> and, that's the point, a non-vanishing standard deviation for these
> fitted coefficients, right?? This somehow is what I expected should
> be an estimate or, at least, a good indicator, of the degree of
> uncertainty which I should assign to the fitted coefficients; it
> seems to me these deviations, thus calculated as a result of the
> simulation, will certainly not be zero (or 3e-17, for that matter).
> So this Std. Error does not provide what I, naively, think should be
> given as a measure of the uncertainties or errors in the fitted
> coefficients...
You are trying to impose an error structure on a data situation that
you constructed artificially to be perfect.
>
> What am I not getting right??
That if you input "perfection" into R's linear regression program, you
get appropriate warnings?
>
> Thanks and sorry for the naive and non-expert question!
You are a Professor of physics, right? You do experiments, right? You
replicate them. S0 perhaps I'm the one who should be puzzled.
> --
> #######################################
> Prof. Mauricio Ortiz Calvao
> Federal University of Rio de Janeiro
> Institute of Physics, P O Box 68528
> CEP 21941-972 Rio de Janeiro, RJ
> Brazil
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list