R-alpha: model.matrix misbehaves on two-sided formulas

Douglas Bates bates@stat.wisc.edu
10 Sep 1997 17:03:04 -0500


I am using R for a graduate class on linear models that I teach.  I
sent the class a sample transcript of explicit construction of the
model matrix and use of the QR decomposition.  One of the members of
the class picked up on the fact that the model matrix was not what I
advertised that it should be when a two-sided formula is used.  A
one-sided formula works properly.  For example,
 > library(st849)          # loads data sets from the text
 > e1.1
       density speed
  [1,]    20.4  38.8
  [2,]    27.4  31.5
  [3,]   106.2  10.6
  [4,]    80.4  16.1
  [5,]   141.3   7.7
  [6,]   130.9   8.3
  [7,]   121.7   8.5
  [8,]   106.5  11.1
  [9,]   130.5   8.6
 [10,]   101.1  11.1
 [11,]   123.9   9.8
 [12,]   144.2   7.8
 [13,]    29.5  31.8
 [14,]    30.8  31.6
 [15,]    26.5  34.0
 [16,]    35.7  28.9
 [17,]    30.0  28.8
 [18,]   106.2  10.5
 [19,]    97.0  12.3
 [20,]    90.1  13.2
 [21,]   106.7  11.4
 [22,]    99.3  11.2
 [23,]   107.2  10.3
 [24,]   109.1  11.4
 > class(e1.1)
 [1] "data.frame"
 > model.matrix(~density, data = e1.1)  # what we would expect
       (Intercept) density
  [1,]           1    20.4
  [2,]           1    27.4
  [3,]           1   106.2
  [4,]           1    80.4
  [5,]           1   141.3
  [6,]           1   130.9
  [7,]           1   121.7
  [8,]           1   106.5
  [9,]           1   130.5
 [10,]           1   101.1
 [11,]           1   123.9
 [12,]           1   144.2
 [13,]           1    29.5
 [14,]           1    30.8
 [15,]           1    26.5
 [16,]           1    35.7
 [17,]           1    30.0
 [18,]           1   106.2
 [19,]           1    97.0
 [20,]           1    90.1
 [21,]           1   106.7
 [22,]           1    99.3
 [23,]           1   107.2
 [24,]           1   109.1
 attr(,"assign")
 [1] 0 1
 > model.matrix(sqrt(speed) ~ density, data = e1.1) # not what we want
       (Intercept) density
  [1,]           1    38.8
  [2,]           1    31.5
  [3,]           1    10.6
  [4,]           1    16.1
  [5,]           1     7.7
  [6,]           1     8.3
  [7,]           1     8.5
  [8,]           1    11.1
  [9,]           1     8.6
 [10,]           1    11.1
 [11,]           1     9.8
 [12,]           1     7.8
 [13,]           1    31.8
 [14,]           1    31.6
 [15,]           1    34.0
 [16,]           1    28.9
 [17,]           1    28.8
 [18,]           1    10.5
 [19,]           1    12.3
 [20,]           1    13.2
 [21,]           1    11.4
 [22,]           1    11.2
 [23,]           1    10.3
 [24,]           1    11.4
 attr(,"assign")
 [1] 0 1

I think that somewhere along the line an expression like
 eval(form[[2]], data)
is being used when it would be better to use
 eval(form[[length(form)]], data)
-- 
Douglas Bates                            bates@stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-