[Rd] Faulty handling of aliasing in predict.lm; wrong constant (PR#600)

john.maindonald@anu.edu.au john.maindonald@anu.edu.au
Wed, 12 Jul 2000 06:41:45 +0200 (MET DST)


There are several residual problems in predict.lm
(1) The constant is wrongly set; it should be the mean
of the values of the y-variable.
(2) Standard errors are wrongly calculated when columns
of X are aliased, or when part of a term is aliased.
(3) There is an associated column labelling issue.

I have sent Thomas Lumley code that fixes these problems,
and replaces looping by rows with matrix operations.

There is an issue of whether predicted values should be
set to zero or to NA for terms that are totally aliased.

> "xx" <-
+ structure(list(fac1 = structure(c(1, 2, 3, 1, 2, 3, 1, 2, 3, 
+ 1, 2, 3, 1, 2, 3, 1, 2, 3), .Label = c("1", "2", "3"), class = "factor"), 
+     fac2 = structure(c(1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 3, 3, 1, 
+     3, 3, 1, 3, 3), .Label = c("1", "2", "3"), class = "factor"), 
+     x1 = c(2, 6, 6, 8, 8, 10, 16, 6, 2, 6, 18, 4, 10, 6, 4, 10, 
+     14, 4), x2 = c(6, 25, 9, 8, 31, 29, 40, 8, 8, 6, 21, 20, 
+     22, 5, 15, 30, 22, 10), alias = c(7, 28, 12, 12, 35, 34, 
+     48, 11, 9, 9, 30, 22, 27, 8, 17, 35, 29, 12), x4 = c(16, 
+     48, 34, 46, 62, 73, 205, 36, 30, 40, 193, 43, 65, 44, 27, 
+     72, 98, 17), y = c(27, 103, 38, 35, 126, 119, 311, 34, 34, 
+     28, 95, 82, 92, 22, 62, 124, 94, 41)), .Names = c("fac1", 
+ "fac2", "x1", "x2", "alias", "x4", "y"), row.names = c("1", "2", 
+ "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
+ "15", "16", "17", "18"), class = "data.frame")
> 
> # fac2 is partly aliased with fac1
> # alias is aliased with x1 and x2
> 
> xx
   fac1 fac2 x1 x2 alias  x4   y
1     1    1  2  6     7  16  27
2     2    2  6 25    28  48 103
3     3    2  6  9    12  34  38
4     1    1  8  8    12  46  35
5     2    2  8 31    35  62 126
6     3    2 10 29    34  73 119
7     1    1 16 40    48 205 311
8     2    2  6  8    11  36  34
9     3    2  2  8     9  30  34
10    1    1  6  6     9  40  28
11    2    3 18 21    30 193  95
12    3    3  4 20    22  43  82
13    1    1 10 22    27  65  92
14    2    3  6  5     8  44  22
15    3    3  4 15    17  27  62
16    1    1 10 30    35  72 124
17    2    3 14 22    29  98  94
18    3    3  4 10    12  17  41
> xx.lm<-lm(y~x1+x2+alias+x4,data=xx)
> coef(xx.lm)
(Intercept)          x1          x2       alias          x4 
 -1.9862799  -7.9406705   4.6773165          NA   0.9931177 
> xx0.lm<-lm(y~x1+x2+x4,data=xx)
> 
> predict(xx0.lm,se=T)$se[1:2]
[1] 9.874704 8.793774
> predict(xx.lm,se=T)$se[1:2]    # Should be the same as above
[1] 10.939605  9.329757
> 
> predict(xx.lm,type="terms")[1:2,]
        x1        x2     alias x4
1 45.87943 -53.78914 -47.50413 NA
2 14.11675  35.07987 -15.72436 NA
> 
> attributes(predict(xx.lm,type="terms"))$constant
(Intercept) 
   -1.98628 
> 
> predict(xx0.lm,type="terms",se=T)$se[1:2,]
         x1       x2        x4
1 17.820298 7.791400 12.783126
2  5.483169 5.081348  4.231348
> predict(xx.lm,type="terms",se=T)$se[1:2,]  # Should be the same as above
            x1           x2     alias x4
1 9.613248e+15 3.826812e+16 12.797489 NA
2 2.957923e+15 2.495747e+16  4.236103 NA
> 
> xx
   fac1 fac2 x1 x2 alias  x4   y
1     1    1  2  6     7  16  27
2     2    2  6 25    28  48 103
3     3    2  6  9    12  34  38
4     1    1  8  8    12  46  35
5     2    2  8 31    35  62 126
6     3    2 10 29    34  73 119
7     1    1 16 40    48 205 311
8     2    2  6  8    11  36  34
9     3    2  2  8     9  30  34
10    1    1  6  6     9  40  28
11    2    3 18 21    30 193  95
12    3    3  4 20    22  43  82
13    1    1 10 22    27  65  92
14    2    3  6  5     8  44  22
15    3    3  4 15    17  27  62
16    1    1 10 30    35  72 124
17    2    3 14 22    29  98  94
18    3    3  4 10    12  17  41



--please do not edit the information below--

Version:
 platform = Windows
 arch = x86
 os = Win32
 system = x86, Win32
 status = 
 major = 1
 minor = 1.0
 year = 2000
 month = June
 day = 15
 language = R

Windows 9x 4.10 (build 2222)  A 

Search Path:
 .GlobalEnv, Autoloads, package:base
John Maindonald               email : john.maindonald@anu.edu.au        
Statistical Consulting Unit,  phone : (6249)3998        
c/o CMA, SMS,                 fax   : (6249)5549  
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._