[Rd] Inconsistent rank in qr()
Martin Maechler
maechler at stat.math.ethz.ch
Tue Jan 23 08:47:01 CET 2018
>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>> on Mon, 22 Jan 2018 17:57:47 +0100 writes:
> Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
>> This behavior is noted in the qr documentation, no?
>>
>> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
> For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
> but here the meaning of "always is full rank" is somewhat confusing. Does it mean
> that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
> May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
> but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
> columns in A) is so easy. Why to discard lapack case from it (even properly documented)?
Because 99.5% of caller to qr() never look at '$rank',
so why should we compute it every time qr() is called?
==> Matrix :: rankMatrix() does use "qr" as one of its several methods.
--------------
As wiser people than me have said (I'm paraphrasing, don't find a nice citation):
While the rank of a matrix is a very well defined concept in
mathematics (theory), its practical computation on a finite
precision computer is much more challenging.
The ?rankMatrix help page (package Matrix, part of your R)
https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
starts with the following 'Description'
__ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.
__ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).
>>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <sokol at insa-toulouse.fr> wrote:
>>>
>>> Hi,
>>>
>>> I have noticed different rank values calculated by qr() depending on
>>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
>>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
>>> which is only occasionally a true rank.
>>>
>>> Would not it be more consistent to replace the rank in the latter case by something
>>> based on the following pseudo code ?
>>>
>>> d=abs(diag(qr))
>>> rank=sum(d >= d[1]*tol)
>>>
>>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
>>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>>>
>>> Serguei.
>>>
>>> How to reproduce:
>>>
>>> a=diag(2)
>>> a[2,2]=0
>>> qaf=qr(a, LAPACK=FALSE)
>>> qaf$rank # shows 1. OK it's the true rank value
>>> qat=qr(a, LAPACK=TRUE)
>>> qat$rank #shows 2. Bad, it's not the expected value.
>>>
> --
> Serguei Sokol
> Ingenieur de recherche INRA
> Cellule mathématique
> LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
> 135 Avenue de Rangueil
> 31077 Toulouse Cedex 04
> tel: +33 5 6155 9849
> email: sokol at insa-toulouse.fr
> http://www.lisbp.fr
More information about the R-devel
mailing list