[R] Problem with lm.resid() when weights are provided

Hamed Ha h@medh@@eli @ending from gm@il@com
Tue Sep 11 14:38:38 CEST 2018


Dear R Help Team.

I get some weird results when I use the lm function with weight. The issue
can be reproduced by the example below:


The input data is (weights are intentionally designed to reflect some
structures in the data)


> df
y x weight
 1.51156139  0.55209240 2.117337e-34
-0.63653132 -0.12599316 2.117337e-34
 0.37782776  0.42095384 4.934135e-31
 3.03792318  1.40315446 2.679495e-24
 1.53646523  0.46076858 2.679495e-24
-2.37727874 -0.73963576 6.244160e-21
 0.37183065  0.20407468 1.455107e-17
-1.53917553 -0.95519361 1.455107e-17
 1.10926675  0.03897129 3.390908e-14
-0.37786333 -0.17523593 3.390908e-14
 2.43973603  0.97970095 7.902000e-11
-0.35432394 -0.03742559 7.902000e-11
 2.19296613  1.00355263 4.289362e-04
 0.49845532  0.34816207 4.289362e-04
 1.25005260  0.76306225 5.000000e-01
 0.84360691  0.45152356 5.000000e-01
 0.29565993  0.53880068 5.000000e-01
-0.54081334 -0.28104525 5.000000e-01
 0.83612836 -0.12885659 9.995711e-01
-1.42526769 -0.87107631 9.999998e-01
 0.10204789 -0.11649899 1.000000e+00
 1.14292898  0.37249631 1.000000e+00
-3.02942081 -1.28966997 1.000000e+00
-1.37549764 -0.74676145 1.000000e+00
-2.00118016 -0.55182759 1.000000e+00
-4.24441674 -1.94603608 1.000000e+00
 1.17168144  1.00868008 1.000000e+00
 2.64007761  1.26333069 1.000000e+00
 1.98550114  1.18509599 1.000000e+00
-0.58941683 -0.61972416 9.999998e-01
-4.57559611 -2.30914920 9.995711e-01
-0.82610544 -0.39347576 9.995711e-01
-0.02768220  0.20076910 9.995711e-01
 0.78186399  0.25690215 9.995711e-01
-0.88314153 -0.20200148 5.000000e-01
-4.17076452 -2.03547588 5.000000e-01
 0.93373070  0.54190626 4.289362e-04
-0.08517734  0.17692491 4.289362e-04
-4.47546619 -2.14876688 4.289362e-04
-1.65509103 -0.76898087 4.289362e-04
-0.39403030 -0.12689705 4.289362e-04
 0.01203300 -0.18689898 1.841442e-07
-4.82762639 -2.31391121 1.841442e-07
-0.72658380 -0.39751171 3.397282e-14
-2.35886866 -1.01082109 0.000000e+00
-2.03762707 -0.96439902 0.000000e+00
 0.90115123  0.60172286 0.000000e+00
 1.55999194  0.83433953 0.000000e+00
 3.07994058  1.30942776 0.000000e+00
 1.78871462  1.10605530 0.000000e+00



Running simple linear model returns:

> lm(y~x,data=df)

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x
   -0.04173      2.03790

and
> max(resid(lm(y~x,data=df)))
[1] 1.14046


*HOWEVER if I use the weighted model then:*

lm(formula = y ~ x, data = df, weights = df$weights)

Coefficients:
(Intercept)            x
   -0.05786      1.96087

and
> max(resid(lm(y~x,data=df,weights=df$weights)))
[1] 60.91888


as you see, the estimation of the coefficients are nearly the same but the
resid() function returns a giant residual (I have some cases where the
value is much much higher). Further, if I calculate the residuals by
simply predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the true
value for the residuals.


Thanks.

Please do not hesitate to contact me for more details.
Regards,
Hamed.

	[[alternative HTML version deleted]]



More information about the R-help mailing list