[R] Possible bug of QR decomposition in package Matrix

Martin Maechler maechler at stat.math.ethz.ch
Thu Aug 4 15:05:24 CEST 2011


>>>>> C6H5NO2  <c6h5no2 at gmail.com>
>>>>>     on Thu, 4 Aug 2011 16:03:54 +0800 writes:

    > Thank you very much, Josh!
    > As you suggested, I will contact the developers of "Matrix".

    > PS, C6 are just initial characters of my email account :-)

    > Best wishes,
    > C6

well, as the posting guide (http://www.R-project.org/posting-guide.html)
says, this is regarded as impolite by many,
and if I wasn't one of the Matrix package authors,
I would not spend time helping  'C6'  either.

    > 2011/8/4 Joshua Wiley <jwiley.psych at gmail.com>:
    >> Hi C6 (were C1 - 5 already taken in your family?),
    >> 
    >> I downloaded your data and can replicate your problem.  R
    >> ceases responding and terminates.  This does not occur with all
    >> uses of qr on a dgCMatrix object.  I know nothing about sparse
    >> matrices, but if you believe this should not be occurring, you
    >> should contact the package maintainers.  Here is my
    >> sessionInfo() (FYI, it would probably be helpful to report
    >> yours also in case the issue is version dependent):
    >> 
    >> R Under development (unstable) (2011-07-30 r56564) Platform:
    >> x86_64-pc-mingw32/x64 (64-bit)
    >> 
    >> locale: [1] LC_COLLATE=English_United States.1252 [2]
    >> LC_CTYPE=English_United States.1252 [3]
    >> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5]
    >> LC_TIME=English_United States.1252
    >> 
    >> attached base packages: [1] stats     graphics  grDevices utils
    >>     datasets  methods   base
    >> 
    >> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30
    >> 
    >> loaded via a namespace (and not attached): [1] grid_2.14.0
    >>  tools_2.14.0
    >> 
    >> Cheers,
    >> 
    >> Josh
    >> 
    >> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 <c6h5no2 at gmail.com>
    >> wrote:
    >>> Hello R users,
    >>> 
    >>> I am trying to give the QR decomposition for a large sparse
    >>> matrix in the format of dgCMatrix. When I run qr function for
    >>> this matrix, the R session simply stops and exits to the
    >>> shell.  The matrix is of size 108595x108595, and it has
    >>> 4866885 non-zeros. I did the experiment on windows 7 and linux
    >>> mint 11 (both 64 bit), and the results are the same.
    >>> 
    >>> I have uploaded my data file to
    >>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I
    >>> hope someone could kindly download it.  The code to see my
    >>> problem is: 

    >>>  library(Matrix) 
    >>>  load("A.RData") 
    >>>  B <- qr(A)


    >>> Best wishes, C6

And what's the size of RAM your two computers have ??
The answer is of quite some importance.

Short answer: If you have a large very sparse matrix, 
you don't know if the QR decomposition of that matrix is also 
very sparse... and if it ain't it will blow up memory,
and that's what I'm pretty sure happened with you.

What I don't see is why R "simply stops" for you and does not
through a an error message about insufficient memory.
As I show below, I do get a seg.fault 
--- which may be considered a bug ---
*BUT* I do get the message about memory problems.
Did you really *not* get any such message?
Is it because you've used a GUI that hides such valuable
information from the user?

Here's the more detailed reason / analysis about why the above
"kills R". This is commented R code, 
you can cut paste after you've got 'A' : 

str(A)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ...
##   ..@ p       : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ...
##   ..@ Dim     : int [1:2] 108595 108595
##   ..@ Dimnames:List of 2
##   .. ..$ : NULL
##   .. ..$ : NULL
##   ..@ x       : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ...
##   ..@ factors : list()

system.time(# the following is still not as fast as it could be):
isSymmetric(A) # yes !
)# 1.13 {the *2nd* time; 1.9 the 1st time !}

## First work with a  submatrix:
n <- 10000
A1 <- A[1:n, 1:n]
system.time(
            qr1 <- qr(A1))# on cmath-8, machine with 48 GB  RAM memory
##   user  system elapsed
## 59.884   0.316  60.240  !!

str(qr1)
## Formal class 'sparseQR' [package "Matrix"] with 6 slots
##   ..@ V   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i       : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ...
##   .. .. ..@ p       : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ...
##   .. .. ..@ Dim     : int [1:2] 10000 10000
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ x       : num [1:7692948] 1 1 -3.71 8.68 -8.68 ...
##   .. .. ..@ factors : list()
##   ..@ beta: num [1:10000] 0 0 0.01216 0.00743 0.01758 ...
##   ..@ p   : int [1:10000] 61 62 63 64 94 80 161 162 163 164 ...
##   ..@ R   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i       : int [1:13581659] 0 1 2 2 3 3 4 2 3 4 ...
##   .. .. ..@ p       : int [1:10001] 0 1 2 3 5 7 11 12 13 15 ...
##   .. .. ..@ Dim     : int [1:2] 10000 10000
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ x       : num [1:13581659] 370.37 3.47 22.14 12.48 17.12 ...
##   .. .. ..@ factors : list()
##   ..@ q   : int [1:10000] 61 62 63 64 80 94 161 162 163 164 ...
##   ..@ Dim : int [1:2] 10000 10000

object.size(A1)
##   4352184 bytes
object.size(qr1)
## 255539456 bytes, i.e. 255 MB
c(object.size(qr1) / object.size(A1)) ## 58.715
c(object.size(A)   / object.size(A1)) ## 13.52
##--> "predicted size" of  qr(A):
c(object.size(A) / object.size(A1))*object.size(qr1)
## ~ 3 G Bytes

n <- 20000
A2 <- A[1:n, 1:n]
system.time(
            qr2 <- qr(A2))# on cmath-8, machine with 48 GB  RAM memory
##     user   system  elapsed
## 1024.068    2.850 1027.488  -- 17 minutes

object.size(A2)
##   8504464 bytes
object.size(qr2)
## 1432'809992 bytes, i.e. 1432.81 MBytes
c(object.size(qr2) / object.size(A2)) ## 168.4774
##--> "predicted size" of  qr(A):
c(object.size(A) / object.size(A2) *object.size(qr2))
## 9912'944757 == 9912.944757 MBytes ~= 10 GBytes --- this will not fit!

## Ok: one step further
n <- 30000
A3 <- A[1:n, 1:n]
system.time(
            qr3 <- qr(A3))# on cmath-8, machine with 48 GB  RAM memory
##     user   system  elapsed
## 3384.183   32.234 3418.392 -- almost one hour !

object.size(A3)
##   11'335112 bytes
object.size(qr3)
## 3059'252216 bytes, i.e. 3059 MBytes
c(object.size(qr3) / object.size(A3)) ## 269.9
##--> "predicted size" of  qr(A):
c(object.size(A) / object.size(A3) *object.size(qr3))
##  1.588e+10 --- ~ 15 GB -- this is *MORE* than an R object can contain:
.Machine$integer.max
## 2147483647 = 2.147'483'647 e9

system.time(ch1 <- chol(A1))
## CHOLMOD warning:
## Error in .local(x, ...) : CHOLMOD factorization was unsuccessful
## In addition: Warning message:
## In .local(x, ...) :
##   Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c,

system.time(lu1 <- lu(A1))
## Error ... : cs_lu(A) failed: near-singular A (or out of memory)

##--- Ok, now try the full thing, and see if  "R dies without a word"
## or if it at least says something before death :
system.time(
            qrA <- qr(A)
)
## 
##  *** caught segfault ***
## address 0x7f3f5f48cf70, cause 'memory not mapped'
## /u/maechler/bin/R_arg: line 137: 15063 Segmentation fault      $exe $@



More information about the R-help mailing list