[R] Possible bug of QR decomposition in package Matrix

C6H5NO2 c6h5no2 at gmail.com
Thu Aug 4 17:11:50 CEST 2011


Thank you, Martin.

Yes, on my computer (48Gb memory) it also gives a message "caught
segfault" as you got. So it is because the matrix size is too large.

I apologize for the offence I've caused.

2011/8/4 Martin Maechler <maechler at stat.math.ethz.ch>:
>>>>>> C6H5NO2  <c6h5no2 at gmail.com>
>>>>>>     on Thu, 4 Aug 2011 16:03:54 +0800 writes:
>
>    > Thank you very much, Josh!
>    > As you suggested, I will contact the developers of "Matrix".
>
>    > PS, C6 are just initial characters of my email account :-)
>
>    > Best wishes,
>    > C6
>
> well, as the posting guide (http://www.R-project.org/posting-guide.html)
> says, this is regarded as impolite by many,
> and if I wasn't one of the Matrix package authors,
> I would not spend time helping  'C6'  either.
>
>    > 2011/8/4 Joshua Wiley <jwiley.psych at gmail.com>:
>    >> Hi C6 (were C1 - 5 already taken in your family?),
>    >>
>    >> I downloaded your data and can replicate your problem.  R
>    >> ceases responding and terminates.  This does not occur with all
>    >> uses of qr on a dgCMatrix object.  I know nothing about sparse
>    >> matrices, but if you believe this should not be occurring, you
>    >> should contact the package maintainers.  Here is my
>    >> sessionInfo() (FYI, it would probably be helpful to report
>    >> yours also in case the issue is version dependent):
>    >>
>    >> R Under development (unstable) (2011-07-30 r56564) Platform:
>    >> x86_64-pc-mingw32/x64 (64-bit)
>    >>
>    >> locale: [1] LC_COLLATE=English_United States.1252 [2]
>    >> LC_CTYPE=English_United States.1252 [3]
>    >> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5]
>    >> LC_TIME=English_United States.1252
>    >>
>    >> attached base packages: [1] stats     graphics  grDevices utils
>    >>     datasets  methods   base
>    >>
>    >> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30
>    >>
>    >> loaded via a namespace (and not attached): [1] grid_2.14.0
>    >>  tools_2.14.0
>    >>
>    >> Cheers,
>    >>
>    >> Josh
>    >>
>    >> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 <c6h5no2 at gmail.com>
>    >> wrote:
>    >>> Hello R users,
>    >>>
>    >>> I am trying to give the QR decomposition for a large sparse
>    >>> matrix in the format of dgCMatrix. When I run qr function for
>    >>> this matrix, the R session simply stops and exits to the
>    >>> shell.  The matrix is of size 108595x108595, and it has
>    >>> 4866885 non-zeros. I did the experiment on windows 7 and linux
>    >>> mint 11 (both 64 bit), and the results are the same.
>    >>>
>    >>> I have uploaded my data file to
>    >>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I
>    >>> hope someone could kindly download it.  The code to see my
>    >>> problem is:
>
>    >>>  library(Matrix)
>    >>>  load("A.RData")
>    >>>  B <- qr(A)
>
>
>    >>> Best wishes, C6
>
> And what's the size of RAM your two computers have ??
> The answer is of quite some importance.
>
> Short answer: If you have a large very sparse matrix,
> you don't know if the QR decomposition of that matrix is also
> very sparse... and if it ain't it will blow up memory,
> and that's what I'm pretty sure happened with you.
>
> What I don't see is why R "simply stops" for you and does not
> through a an error message about insufficient memory.
> As I show below, I do get a seg.fault
> --- which may be considered a bug ---
> *BUT* I do get the message about memory problems.
> Did you really *not* get any such message?
> Is it because you've used a GUI that hides such valuable
> information from the user?
>
> Here's the more detailed reason / analysis about why the above
> "kills R". This is commented R code,
> you can cut paste after you've got 'A' :
>
> str(A)
> ## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ##   ..@ i       : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ...
> ##   ..@ p       : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ...
> ##   ..@ Dim     : int [1:2] 108595 108595
> ##   ..@ Dimnames:List of 2
> ##   .. ..$ : NULL
> ##   .. ..$ : NULL
> ##   ..@ x       : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ...
> ##   ..@ factors : list()
>
> system.time(# the following is still not as fast as it could be):
> isSymmetric(A) # yes !
> )# 1.13 {the *2nd* time; 1.9 the 1st time !}
>
> ## First work with a  submatrix:
> n <- 10000
> A1 <- A[1:n, 1:n]
> system.time(
>            qr1 <- qr(A1))# on cmath-8, machine with 48 GB  RAM memory
> ##   user  system elapsed
> ## 59.884   0.316  60.240  !!
>
> str(qr1)
> ## Formal class 'sparseQR' [package "Matrix"] with 6 slots
> ##   ..@ V   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ##   .. .. ..@ i       : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ...
> ##   .. .. ..@ p       : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ...
> ##   .. .. ..@ Dim     : int [1:2] 10000 10000
> ##   .. .. ..@ Dimnames:List of 2
> ##   .. .. .. ..$ : NULL
> ##   .. .. .. ..$ : NULL
> ##   .. .. ..@ x       : num [1:7692948] 1 1 -3.71 8.68 -8.68 ...
> ##   .. .. ..@ factors : list()
> ##   ..@ beta: num [1:10000] 0 0 0.01216 0.00743 0.01758 ...
> ##   ..@ p   : int [1:10000] 61 62 63 64 94 80 161 162 163 164 ...
> ##   ..@ R   :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ##   .. .. ..@ i       : int [1:13581659] 0 1 2 2 3 3 4 2 3 4 ...
> ##   .. .. ..@ p       : int [1:10001] 0 1 2 3 5 7 11 12 13 15 ...
> ##   .. .. ..@ Dim     : int [1:2] 10000 10000
> ##   .. .. ..@ Dimnames:List of 2
> ##   .. .. .. ..$ : NULL
> ##   .. .. .. ..$ : NULL
> ##   .. .. ..@ x       : num [1:13581659] 370.37 3.47 22.14 12.48 17.12 ...
> ##   .. .. ..@ factors : list()
> ##   ..@ q   : int [1:10000] 61 62 63 64 80 94 161 162 163 164 ...
> ##   ..@ Dim : int [1:2] 10000 10000
>
> object.size(A1)
> ##   4352184 bytes
> object.size(qr1)
> ## 255539456 bytes, i.e. 255 MB
> c(object.size(qr1) / object.size(A1)) ## 58.715
> c(object.size(A)   / object.size(A1)) ## 13.52
> ##--> "predicted size" of  qr(A):
> c(object.size(A) / object.size(A1))*object.size(qr1)
> ## ~ 3 G Bytes
>
> n <- 20000
> A2 <- A[1:n, 1:n]
> system.time(
>            qr2 <- qr(A2))# on cmath-8, machine with 48 GB  RAM memory
> ##     user   system  elapsed
> ## 1024.068    2.850 1027.488  -- 17 minutes
>
> object.size(A2)
> ##   8504464 bytes
> object.size(qr2)
> ## 1432'809992 bytes, i.e. 1432.81 MBytes
> c(object.size(qr2) / object.size(A2)) ## 168.4774
> ##--> "predicted size" of  qr(A):
> c(object.size(A) / object.size(A2) *object.size(qr2))
> ## 9912'944757 == 9912.944757 MBytes ~= 10 GBytes --- this will not fit!
>
> ## Ok: one step further
> n <- 30000
> A3 <- A[1:n, 1:n]
> system.time(
>            qr3 <- qr(A3))# on cmath-8, machine with 48 GB  RAM memory
> ##     user   system  elapsed
> ## 3384.183   32.234 3418.392 -- almost one hour !
>
> object.size(A3)
> ##   11'335112 bytes
> object.size(qr3)
> ## 3059'252216 bytes, i.e. 3059 MBytes
> c(object.size(qr3) / object.size(A3)) ## 269.9
> ##--> "predicted size" of  qr(A):
> c(object.size(A) / object.size(A3) *object.size(qr3))
> ##  1.588e+10 --- ~ 15 GB -- this is *MORE* than an R object can contain:
> .Machine$integer.max
> ## 2147483647 = 2.147'483'647 e9
>
> system.time(ch1 <- chol(A1))
> ## CHOLMOD warning:
> ## Error in .local(x, ...) : CHOLMOD factorization was unsuccessful
> ## In addition: Warning message:
> ## In .local(x, ...) :
> ##   Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c,
>
> system.time(lu1 <- lu(A1))
> ## Error ... : cs_lu(A) failed: near-singular A (or out of memory)
>
> ##--- Ok, now try the full thing, and see if  "R dies without a word"
> ## or if it at least says something before death :
> system.time(
>            qrA <- qr(A)
> )
> ##
> ##  *** caught segfault ***
> ## address 0x7f3f5f48cf70, cause 'memory not mapped'
> ## /u/maechler/bin/R_arg: line 137: 15063 Segmentation fault      $exe $@
>



More information about the R-help mailing list