[R] Possible bug of QR decomposition in package Matrix
C6H5NO2
c6h5no2 at gmail.com
Thu Aug 4 17:11:50 CEST 2011
Thank you, Martin.
Yes, on my computer (48Gb memory) it also gives a message "caught
segfault" as you got. So it is because the matrix size is too large.
I apologize for the offence I've caused.
2011/8/4 Martin Maechler <maechler at stat.math.ethz.ch>:
>>>>>> C6H5NO2 <c6h5no2 at gmail.com>
>>>>>> on Thu, 4 Aug 2011 16:03:54 +0800 writes:
>
> > Thank you very much, Josh!
> > As you suggested, I will contact the developers of "Matrix".
>
> > PS, C6 are just initial characters of my email account :-)
>
> > Best wishes,
> > C6
>
> well, as the posting guide (http://www.R-project.org/posting-guide.html)
> says, this is regarded as impolite by many,
> and if I wasn't one of the Matrix package authors,
> I would not spend time helping 'C6' either.
>
> > 2011/8/4 Joshua Wiley <jwiley.psych at gmail.com>:
> >> Hi C6 (were C1 - 5 already taken in your family?),
> >>
> >> I downloaded your data and can replicate your problem. R
> >> ceases responding and terminates. This does not occur with all
> >> uses of qr on a dgCMatrix object. I know nothing about sparse
> >> matrices, but if you believe this should not be occurring, you
> >> should contact the package maintainers. Here is my
> >> sessionInfo() (FYI, it would probably be helpful to report
> >> yours also in case the issue is version dependent):
> >>
> >> R Under development (unstable) (2011-07-30 r56564) Platform:
> >> x86_64-pc-mingw32/x64 (64-bit)
> >>
> >> locale: [1] LC_COLLATE=English_United States.1252 [2]
> >> LC_CTYPE=English_United States.1252 [3]
> >> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5]
> >> LC_TIME=English_United States.1252
> >>
> >> attached base packages: [1] stats graphics grDevices utils
> >> datasets methods base
> >>
> >> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30
> >>
> >> loaded via a namespace (and not attached): [1] grid_2.14.0
> >> tools_2.14.0
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 <c6h5no2 at gmail.com>
> >> wrote:
> >>> Hello R users,
> >>>
> >>> I am trying to give the QR decomposition for a large sparse
> >>> matrix in the format of dgCMatrix. When I run qr function for
> >>> this matrix, the R session simply stops and exits to the
> >>> shell. The matrix is of size 108595x108595, and it has
> >>> 4866885 non-zeros. I did the experiment on windows 7 and linux
> >>> mint 11 (both 64 bit), and the results are the same.
> >>>
> >>> I have uploaded my data file to
> >>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I
> >>> hope someone could kindly download it. The code to see my
> >>> problem is:
>
> >>> library(Matrix)
> >>> load("A.RData")
> >>> B <- qr(A)
>
>
> >>> Best wishes, C6
>
> And what's the size of RAM your two computers have ??
> The answer is of quite some importance.
>
> Short answer: If you have a large very sparse matrix,
> you don't know if the QR decomposition of that matrix is also
> very sparse... and if it ain't it will blow up memory,
> and that's what I'm pretty sure happened with you.
>
> What I don't see is why R "simply stops" for you and does not
> through a an error message about insufficient memory.
> As I show below, I do get a seg.fault
> --- which may be considered a bug ---
> *BUT* I do get the message about memory problems.
> Did you really *not* get any such message?
> Is it because you've used a GUI that hides such valuable
> information from the user?
>
> Here's the more detailed reason / analysis about why the above
> "kills R". This is commented R code,
> you can cut paste after you've got 'A' :
>
> str(A)
> ## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ## ..@ i : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ...
> ## ..@ p : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ...
> ## ..@ Dim : int [1:2] 108595 108595
> ## ..@ Dimnames:List of 2
> ## .. ..$ : NULL
> ## .. ..$ : NULL
> ## ..@ x : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ...
> ## ..@ factors : list()
>
> system.time(# the following is still not as fast as it could be):
> isSymmetric(A) # yes !
> )# 1.13 {the *2nd* time; 1.9 the 1st time !}
>
> ## First work with a submatrix:
> n <- 10000
> A1 <- A[1:n, 1:n]
> system.time(
> qr1 <- qr(A1))# on cmath-8, machine with 48 GB RAM memory
> ## user system elapsed
> ## 59.884 0.316 60.240 !!
>
> str(qr1)
> ## Formal class 'sparseQR' [package "Matrix"] with 6 slots
> ## ..@ V :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ## .. .. ..@ i : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ...
> ## .. .. ..@ p : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ...
> ## .. .. ..@ Dim : int [1:2] 10000 10000
> ## .. .. ..@ Dimnames:List of 2
> ## .. .. .. ..$ : NULL
> ## .. .. .. ..$ : NULL
> ## .. .. ..@ x : num [1:7692948] 1 1 -3.71 8.68 -8.68 ...
> ## .. .. ..@ factors : list()
> ## ..@ beta: num [1:10000] 0 0 0.01216 0.00743 0.01758 ...
> ## ..@ p : int [1:10000] 61 62 63 64 94 80 161 162 163 164 ...
> ## ..@ R :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ## .. .. ..@ i : int [1:13581659] 0 1 2 2 3 3 4 2 3 4 ...
> ## .. .. ..@ p : int [1:10001] 0 1 2 3 5 7 11 12 13 15 ...
> ## .. .. ..@ Dim : int [1:2] 10000 10000
> ## .. .. ..@ Dimnames:List of 2
> ## .. .. .. ..$ : NULL
> ## .. .. .. ..$ : NULL
> ## .. .. ..@ x : num [1:13581659] 370.37 3.47 22.14 12.48 17.12 ...
> ## .. .. ..@ factors : list()
> ## ..@ q : int [1:10000] 61 62 63 64 80 94 161 162 163 164 ...
> ## ..@ Dim : int [1:2] 10000 10000
>
> object.size(A1)
> ## 4352184 bytes
> object.size(qr1)
> ## 255539456 bytes, i.e. 255 MB
> c(object.size(qr1) / object.size(A1)) ## 58.715
> c(object.size(A) / object.size(A1)) ## 13.52
> ##--> "predicted size" of qr(A):
> c(object.size(A) / object.size(A1))*object.size(qr1)
> ## ~ 3 G Bytes
>
> n <- 20000
> A2 <- A[1:n, 1:n]
> system.time(
> qr2 <- qr(A2))# on cmath-8, machine with 48 GB RAM memory
> ## user system elapsed
> ## 1024.068 2.850 1027.488 -- 17 minutes
>
> object.size(A2)
> ## 8504464 bytes
> object.size(qr2)
> ## 1432'809992 bytes, i.e. 1432.81 MBytes
> c(object.size(qr2) / object.size(A2)) ## 168.4774
> ##--> "predicted size" of qr(A):
> c(object.size(A) / object.size(A2) *object.size(qr2))
> ## 9912'944757 == 9912.944757 MBytes ~= 10 GBytes --- this will not fit!
>
> ## Ok: one step further
> n <- 30000
> A3 <- A[1:n, 1:n]
> system.time(
> qr3 <- qr(A3))# on cmath-8, machine with 48 GB RAM memory
> ## user system elapsed
> ## 3384.183 32.234 3418.392 -- almost one hour !
>
> object.size(A3)
> ## 11'335112 bytes
> object.size(qr3)
> ## 3059'252216 bytes, i.e. 3059 MBytes
> c(object.size(qr3) / object.size(A3)) ## 269.9
> ##--> "predicted size" of qr(A):
> c(object.size(A) / object.size(A3) *object.size(qr3))
> ## 1.588e+10 --- ~ 15 GB -- this is *MORE* than an R object can contain:
> .Machine$integer.max
> ## 2147483647 = 2.147'483'647 e9
>
> system.time(ch1 <- chol(A1))
> ## CHOLMOD warning:
> ## Error in .local(x, ...) : CHOLMOD factorization was unsuccessful
> ## In addition: Warning message:
> ## In .local(x, ...) :
> ## Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c,
>
> system.time(lu1 <- lu(A1))
> ## Error ... : cs_lu(A) failed: near-singular A (or out of memory)
>
> ##--- Ok, now try the full thing, and see if "R dies without a word"
> ## or if it at least says something before death :
> system.time(
> qrA <- qr(A)
> )
> ##
> ## *** caught segfault ***
> ## address 0x7f3f5f48cf70, cause 'memory not mapped'
> ## /u/maechler/bin/R_arg: line 137: 15063 Segmentation fault $exe $@
>
More information about the R-help
mailing list