[R] Simple lm/regression question

James Annan jdannan at jamstec.go.jp
Mon Feb 6 03:28:22 CET 2012


I am trying to use lm for a simple linear fit with weights. The results 
I get from IDL (which I am more familiar with) seem correct and 
intuitive, but the "lm" function in R gives outputs that seem strange to me.

Unweighted case:

 > x<-1:4
 > y<-(1:4)^2
 > summary(lm(y~x))

Call:
lm(formula = y ~ x)

Residuals:
  1  2  3  4
  1 -1 -1  1

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  -5.0000     1.7321  -2.887   0.1020
x             5.0000     0.6325   7.906   0.0156 *

So far, so good - IDL does much the same:

IDL> vec=linfit(x,y,sigma=sig)
IDL> print,vec,sig
      -5.00000      5.00000
       1.73205     0.632456

Now, if the dependent variable has known (measurement) uncertainties 
(10, say) then it is appropriate to use weights defined as the inverse 
of the variances, right?

 > summary(lm(y~x,weights=c(.01,.01,.01,.01)))

Call:
lm(formula = y ~ x, weights = c(0.01, 0.01, 0.01, 0.01))

Residuals:
    1    2    3    4
  0.1 -0.1 -0.1  0.1

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  -5.0000     1.7321  -2.887   0.1020
x             5.0000     0.6325   7.906   0.0156 *

But here the *residuals* have changed and are no longer the actual 
response minus fitted values. (They seem to be scaled by the sqrt of the 
weights). The uncertainties on the parameter estimates, however, have 
*not* changed, which seems very odd to me.

The behaviour of IDL is rather different and intuitive to me:

IDL> vec=linfit(x,y,sigma=sig,measure_errors=[1,1,1,1])
IDL> print,vec,sig
      -5.00000      5.00000
       1.22474     0.447214

IDL> vec=linfit(x,y,sigma=sig,measure_errors=[10,10,10,10])
IDL> print,vec,sig
      -5.00000      5.00000
       12.2474      4.47214

Note that the uncertainties are 10* larger when the errors are 10* larger.

My question is really how to replicate these results in R. I suspect 
there is some terminology or convention that I'm confused by. But in the 
lm help page documentation, even the straightforward definition of 
"residual" seems incompatible with what the code actually returns:

	residuals: the residuals, that is response minus fitted values.

But, in the weighted case, this is not actually true...

James
-- 
James D Annan jdannan at jamstec.go.jp Tel: +81-45-778-5618 (Fax 5707)
Senior Scientist, Research Institute for Global Change, JAMSTEC
Yokohama Institute for Earth Sciences, 3173-25 Showamachi,
Kanazawa-ku, Yokohama City, Kanagawa, 236-0001 Japan
http://www.jamstec.go.jp/frcgc/research/d5/jdannan/



More information about the R-help mailing list