[Rd] Inconsistent rank in qr()

Keith O'Hara keith.ohara at nyu.edu
Wed Jan 24 05:23:01 CET 2018


A simpler fix: return rank=NA when Lapack is used.

Keith

> On Jan 23, 2018, at 5:36 AM, Serguei Sokol <sokol at insa-toulouse.fr> wrote:
> 
> Le 23/01/2018 à 08:47, Martin Maechler a écrit :
>>>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>>>>     on Mon, 22 Jan 2018 17:57:47 +0100 writes:
>>     > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
>>     >> This behavior is noted in the qr documentation, no?
>>     >>
>>     >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
>>     > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
>>     > but here the meaning of "always is full rank" is somewhat confusing. Does it mean
>>     > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
>>     > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
>>     > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
>>     > columns in A) is so easy. Why to discard lapack case from it (even properly documented)?
>> 
>> Because 99.5% of caller to qr()  never look at '$rank',
>> so why should we compute it every time qr() is called?
> 1. Because R already does it for linpack so it would be consistent to do so for lapack too.
> 2. Because R pretends that it is a part of a returned qr class.
> 3. Because its calculation is a negligible fraction of QR itself.
> 
>> 
>> ==> Matrix :: rankMatrix() does use "qr" as one of its several methods.
>> 
>> --------------
>> 
>> As wiser people than me have said (I'm paraphrasing, don't find a nice citation):
>> 
>>   While the rank of a matrix is a very well defined concept in
>>   mathematics (theory), its practical computation on a finite
>>   precision computer is much more challenging.
> True. It is indeed depending of round-off errors during QR calculations and tolerance
> setting but putting it just as min(nrow(A), ncol(A)) and still calling it rank of "full rank"
> is by far the most misleading choice to my mind.
> 
> Once again, if we are already calculating it for linpack let do it in most consistent
> way for lapack too. I can propose a patch if you will.
> 
> Serguei.
> 
>> 
>> The ?rankMatrix  help page (package Matrix, part of your R)
>>    https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
>> starts with the following 'Description'
>> 
>> __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.
>> 
>> __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).
>> 
>> 
>>     >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <sokol at insa-toulouse.fr> wrote:
>>     >>>
>>     >>> Hi,
>>     >>>
>>     >>> I have noticed different rank values calculated by qr() depending on
>>     >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
>>     >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
>>     >>> which is only occasionally a true rank.
>>     >>>
>>     >>> Would not it be more consistent to replace the rank in the latter case by something
>>     >>> based on the following pseudo code ?
>>     >>>
>>     >>> d=abs(diag(qr))
>>     >>> rank=sum(d >= d[1]*tol)
>>     >>>
>>     >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
>>     >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>>     >>>
>>     >>> Serguei.
>>     >>>
>>     >>> How to reproduce:
>>     >>>
>>     >>> a=diag(2)
>>     >>> a[2,2]=0
>>     >>> qaf=qr(a, LAPACK=FALSE)
>>     >>> qaf$rank # shows 1. OK it's the true rank value
>>     >>> qat=qr(a, LAPACK=TRUE)
>>     >>> qat$rank #shows 2. Bad, it's not the expected value.
>>     >>>
>> 
>>     > --
>>     > Serguei Sokol
>>     > Ingenieur de recherche INRA
>> 
>>     > Cellule mathématique
>>     > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
>>     > 135 Avenue de Rangueil
>>     > 31077 Toulouse Cedex 04
>> 
>>     > tel: +33 5 6155 9849
>>     > email: sokol at insa-toulouse.fr
>>     > http://www.lisbp.fr


	[[alternative HTML version deleted]]



More information about the R-devel mailing list