[Rd] Faulty handling of aliasing in predict.lm; wrong constant (PR#600)
john.maindonald@anu.edu.au
john.maindonald@anu.edu.au
Wed, 12 Jul 2000 06:41:45 +0200 (MET DST)
There are several residual problems in predict.lm
(1) The constant is wrongly set; it should be the mean
of the values of the y-variable.
(2) Standard errors are wrongly calculated when columns
of X are aliased, or when part of a term is aliased.
(3) There is an associated column labelling issue.
I have sent Thomas Lumley code that fixes these problems,
and replaces looping by rows with matrix operations.
There is an issue of whether predicted values should be
set to zero or to NA for terms that are totally aliased.
> "xx" <-
+ structure(list(fac1 = structure(c(1, 2, 3, 1, 2, 3, 1, 2, 3,
+ 1, 2, 3, 1, 2, 3, 1, 2, 3), .Label = c("1", "2", "3"), class = "factor"),
+ fac2 = structure(c(1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 3, 3, 1,
+ 3, 3, 1, 3, 3), .Label = c("1", "2", "3"), class = "factor"),
+ x1 = c(2, 6, 6, 8, 8, 10, 16, 6, 2, 6, 18, 4, 10, 6, 4, 10,
+ 14, 4), x2 = c(6, 25, 9, 8, 31, 29, 40, 8, 8, 6, 21, 20,
+ 22, 5, 15, 30, 22, 10), alias = c(7, 28, 12, 12, 35, 34,
+ 48, 11, 9, 9, 30, 22, 27, 8, 17, 35, 29, 12), x4 = c(16,
+ 48, 34, 46, 62, 73, 205, 36, 30, 40, 193, 43, 65, 44, 27,
+ 72, 98, 17), y = c(27, 103, 38, 35, 126, 119, 311, 34, 34,
+ 28, 95, 82, 92, 22, 62, 124, 94, 41)), .Names = c("fac1",
+ "fac2", "x1", "x2", "alias", "x4", "y"), row.names = c("1", "2",
+ "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
+ "15", "16", "17", "18"), class = "data.frame")
>
> # fac2 is partly aliased with fac1
> # alias is aliased with x1 and x2
>
> xx
fac1 fac2 x1 x2 alias x4 y
1 1 1 2 6 7 16 27
2 2 2 6 25 28 48 103
3 3 2 6 9 12 34 38
4 1 1 8 8 12 46 35
5 2 2 8 31 35 62 126
6 3 2 10 29 34 73 119
7 1 1 16 40 48 205 311
8 2 2 6 8 11 36 34
9 3 2 2 8 9 30 34
10 1 1 6 6 9 40 28
11 2 3 18 21 30 193 95
12 3 3 4 20 22 43 82
13 1 1 10 22 27 65 92
14 2 3 6 5 8 44 22
15 3 3 4 15 17 27 62
16 1 1 10 30 35 72 124
17 2 3 14 22 29 98 94
18 3 3 4 10 12 17 41
> xx.lm<-lm(y~x1+x2+alias+x4,data=xx)
> coef(xx.lm)
(Intercept) x1 x2 alias x4
-1.9862799 -7.9406705 4.6773165 NA 0.9931177
> xx0.lm<-lm(y~x1+x2+x4,data=xx)
>
> predict(xx0.lm,se=T)$se[1:2]
[1] 9.874704 8.793774
> predict(xx.lm,se=T)$se[1:2] # Should be the same as above
[1] 10.939605 9.329757
>
> predict(xx.lm,type="terms")[1:2,]
x1 x2 alias x4
1 45.87943 -53.78914 -47.50413 NA
2 14.11675 35.07987 -15.72436 NA
>
> attributes(predict(xx.lm,type="terms"))$constant
(Intercept)
-1.98628
>
> predict(xx0.lm,type="terms",se=T)$se[1:2,]
x1 x2 x4
1 17.820298 7.791400 12.783126
2 5.483169 5.081348 4.231348
> predict(xx.lm,type="terms",se=T)$se[1:2,] # Should be the same as above
x1 x2 alias x4
1 9.613248e+15 3.826812e+16 12.797489 NA
2 2.957923e+15 2.495747e+16 4.236103 NA
>
> xx
fac1 fac2 x1 x2 alias x4 y
1 1 1 2 6 7 16 27
2 2 2 6 25 28 48 103
3 3 2 6 9 12 34 38
4 1 1 8 8 12 46 35
5 2 2 8 31 35 62 126
6 3 2 10 29 34 73 119
7 1 1 16 40 48 205 311
8 2 2 6 8 11 36 34
9 3 2 2 8 9 30 34
10 1 1 6 6 9 40 28
11 2 3 18 21 30 193 95
12 3 3 4 20 22 43 82
13 1 1 10 22 27 65 92
14 2 3 6 5 8 44 22
15 3 3 4 15 17 27 62
16 1 1 10 30 35 72 124
17 2 3 14 22 29 98 94
18 3 3 4 10 12 17 41
--please do not edit the information below--
Version:
platform = Windows
arch = x86
os = Win32
system = x86, Win32
status =
major = 1
minor = 1.0
year = 2000
month = June
day = 15
language = R
Windows 9x 4.10 (build 2222) A
Search Path:
.GlobalEnv, Autoloads, package:base
John Maindonald email : john.maindonald@anu.edu.au
Statistical Consulting Unit, phone : (6249)3998
c/o CMA, SMS, fax : (6249)5549
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._