[Rd] qr.coef: permutes dimnames; inserts NA; promises minimum-length (PR#9623)

Thu Apr 19 17:03:06 CEST 2007

Full_Name: Christian Brechbuehler
Version: 2.4.1 Patched (2007-03-25 r40917)
OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS
Submission from: (NULL) (24.61.47.236)

Splus and R have different ideas about what qr.coef(qr()) should return,
which is fine... but I believe that R has a bug in that it is not
internally consistent, and another separate bug in the documentation. 

In particular, on this problem, Splus generates a permuted solution
vector:

Splus 6.2.1:
    > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
"one")))
    > A
         zero one 
    [1,]    0   1
    [2,]    0   1
    [3,]    0   1
    > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
    > y
         y 
    [1,] 6
    [2,] 7
    [3,] 8
    > x <- qr.coef(qr(A), y)
    > x
         y 
     one 7
    zero 0
    > 

To my taste, the answer from Splus is unexpected; I was looking for
this:

    > x[qr(A)$pivot,,drop=F]
         y 
    zero 0
     one 7

But at least the answer is internally consistent: the values and the
row names were scrambled in corresponding ways.

However, R seems to helpfully un-permute the data values, but forgets
to un-permute the dimnames, which seems broken.  

R version 2.4.1 Patched (2007-03-25 r40917)
    > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
"one")))
    > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
    > x <- qr.coef(qr(A), y)
    > x
          y
    one  NA
    zero  7

I think the answer from qr.coef() should look more like this:

    > x.wanted
          y
    zero NA
    one   7
    > 

In addition, R uses NA to fill in rather than zero.  NA for this
problem above might mean "undefined": any value will do.  But given
the application of this function, where the solution might be
multiplied by the matrix, any NA will cause that to turn into an NA
result... in math, zero times anything is zero, but in R, NA times
anything (even zero) is NA.  This seems somewhat inconvenient.  So I'd really
like
R to return this:

    > x
          y
    zero  0
    one   7

More on the scrambling; I see in dqrsl.f that:

c               if pivoting was requested in dqrdc, the j-th
c               component of b will be associated with column jpvt(j)
c               of the original matrix x that was input into dqrdc.)

which I think is referring to the helpful un-permuting, but I think
qr.coef() in R needs to correspondingly un-permute the dimnames as
well?

  Code from qr.coef: 
    if (!is.null(nam <- colnames(qr$qr))) 
        rownames(coef) <- nam
  Proposed fix: That second line could be changed to
        rownames(coef)[qr$pivot] <- nam

Another issue (either with the documentation or the code) is as follows.
In R, help(qr) promises:

   'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve'
   solves systems of equations via the QR decomposition: if 'a' is a
   QR decomposition it is the same as 'solve.qr', but if 'a' is a
   rectangular matrix the QR decomposition is computed first.  Either
   will handle over- and under-determined systems, providing a
   minimal-length solution or a least-squares fit if appropriate.

So unlike Splus, R promises a minimal-length solution, but I don't
think it delivers that.  For example, let's get the minimal-length
solution for the following under-determined system:

     x1 + 2*x2  ==  5

qr.solve would fail due to the singular matrix, so use qr.coef:

     > qr.coef(qr(t(1:2)),c(5))

   R:
     [1]  5 NA

   Splus:
     [1] 5 0

1*5 + 2*0 does equal 5, so modulo NA, this is a solution, but it is
NOT minimal-length.  OTOH, svd gives the right answer in both Splus and
R:

     > d <- svd(t(1:2));  c(5) %*% d$u %*% diag(1/d$d, length(d$d)) %*% t(d$v)
          [,1] [,2]
     [1,]    1    2

This *is* minimum length, and 1*1 + 2*2 == 5.  So either qr.coef(qr())
should deliver that, or the documentation should clarify when a
minimal-length solution is delivered?

And revisiting the NA issue... in this problem, for the qr.coef()
result to be a solution, that 2nd value MUST be zero, no other values
will work.  So the NA seems wrong here, more clearly than in the other
example.

Finally, a simpler test case that shows the same issues with a 2x1
example instead of 2x3; any fix to qr.coef() should handle this as
well:

> qr.coef(qr(matrix(0:1,1,dimnames=list(NULL, c("zero","one")))),5)
 one zero 
  NA    5 

I think that should return:

 zero  one
    0    5 

or at least:

 zero  one
   NA    5