[R] Creating a sparse matrix from a file
Martin Maechler
maechler at stat.math.ethz.ch
Tue Oct 27 16:04:22 CET 2009
>>>>> "PP" == Pallavi P <pallavip.05 at gmail.com>
>>>>> on Tue, 27 Oct 2009 18:13:22 +0530 writes:
PP> Hi Martin,
PP> Thanks for the help. Just to make sure I understand correctly.
PP> The below steps are for creating an example table similar to the one that I
PP> read from file.
yes, exactly
n <- 22638
m <- 80914
nnz <- 300000 # no idea if this is realistic for you
set.seed(101)
ex <- cbind(i = sample(n,nnz, replace=TRUE),
j = sample(m,nnz, replace=TRUE),
x = round(100 * rnorm(nnz)))
PP> and I can understand the way sparseMatrix is initialized right now as
M <- sparseMatrix(i = ex[,"i"],
j = ex[,"j"],
x = ex[,"x"])
PP> How ever, I couldn't understand the use of below commands.
MM. <- tcrossprod(M) # == MM' := M %*% t(M)
M.1 <- M %*% rep(1, ncol(M))
stopifnot(identical(drop(M.1), rowSums(M)))
They were just for illustrative purposes,
to show how and that you can work with the created sparse matrix
'M'.
Regards,
Martin Maechler, ETH Zurich
PP> Kindly let me know if I missed something.
PP> Thanks
PP> Pallavi
PP> On Tue, Oct 27, 2009 at 4:12 PM, Martin Maechler <maechler at stat.math.ethz.ch
>> wrote:
>>
PP> Hi all,
>>
PP> I used sparseM package for creating sparse Matrix and
PP> followed below commands.
>>
>> I'd strongly recommend to use package 'Matrix' which is part of
>> every R distribution (since R 2.9.0).
>>
PP> The sequence of commands are:
>>
>> >> ex <- read.table('fileName',sep=',')
>> >> M <- as.matrix.csr(0,22638,80914)
>> >> for (i in 1:nrow(ex)) { M[ex[i,1],ex[i,2]]<-ex[i,3]}
>>
>> This is very slow in either 'Matrix' or 'SparseM'
>> as soon as nrow(ex) is non-small.
>>
>> However, there are very efficient ways to construct the sparse
>> matrix directly from your 'ex' structure:
>> In 'Matrix' you should use the sparseMatrix() function as you
>> had proposed.
>>
>> Here I provide a reproducible example,
>> using a random 'ex':
>>
>>
>> n <- 22638
>> m <- 80914
>> nnz <- 300000 # no idea if this is realistic for you
>>
>> set.seed(101)
>> ex <- cbind(i = sample(n,nnz, replace=TRUE),
>> j = sample(m,nnz, replace=TRUE),
>> x = round(100 * rnorm(nnz)))
>>
>> library(Matrix)
>>
>> M <- sparseMatrix(i = ex[,"i"],
>> j = ex[,"j"],
>> x = ex[,"x"])
>> MM. <- tcrossprod(M) # == MM' := M %*% t(M)
>>
>> M.1 <- M %*% rep(1, ncol(M))
>> stopifnot(identical(drop(M.1), rowSums(M)))
>>
>> ## .... and now do other stuff with your sparse matrix M
>>
>>
PP> Even after 4 hours, I can still see the above command running. But,
>> I am not
PP> sure whether it got stuck some where.
>>
PP> Also, when I initialize matrix M and try to display the values, I
>> can see
PP> something like this
PP> [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> 2 2 2 2
PP> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>> 2 2 2 2
PP> 2 2 2 2 2 2 2 2 2 2
PP> [85] 2 2
>>
PP> And, after I stopped executing above initialize command from
>> table(after 4
PP> hours). I could see a different values.
>>
PP> Could some one kindly explain what these number are about and how
>> can I test
PP> that my command is running and not just stuck some where.
>>
PP> Also, it would be great if some one point me to a tutorial if any on
>> sparse
PP> matricies on R as I couldn't get one from internet.
>>
PP> Thanks
PP> Pallavi
>>
>>
>>
PP> Pallavi Palleti wrote:
>> >>
>> >> Hi David,
>> >>
>> >> Thanks for your help. This is exactly what I want.
>> >> But, I have number of rows of my matrix = 25k and columns size as
>> 80k. So,
>> >> when I define a matrix object, it is throwing an error saying can not
>> >> allocate a vector of length (25K * 80k). I heard that, this data can
>> still
>> >> be loaded into R using sparseMatrix. However, I couldn't get a syntax
>> for
>> >> creating the same. Could someone kindly help me in this regard.
>> >>
>> >> Thanks
>> >> Pallavi
>> >>
>> >>
>> >> David Winsemius wrote:
>> >>>
>> >>>
>> >>> On Oct 26, 2009, at 5:06 AM, Pallavi Palleti wrote:
>> >>>
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> I am new to R and learning the same. I would like to create a
>> sparse
>> >>>> matrix
>> >>>> from an existing file whose contents are in the format
>> >>>> "rowIndex,columnIndex,value"
>> >>>>
>> >>>> for ex:
>> >>>> 1,2,14
>> >>>> 2,4,15
>> >>>>
>> >>>> I would like to create a sparse matrix by taking the above as
>> input.
>> >>>> However, I couldn't find an example where the data was being read
>> >>>> from a
>> >>>> file. I tried searching in R tutorial and also searched for the
>> same
>> >>>> in web
>> >>>> but in vain. Could some one kindly help me how to give the above
>> >>>> format as
>> >>>> input in R to create a sparse matrix.
>> >>>
>> >>> ex <- read.table(textConnection("1,2,14
>> >>> 2,4,15") , sep=",")
>> >>> ex
>> >>> # V1 V2 V3
>> >>> #1 1 2 14
>> >>> #2 2 4 15
>> >>>
>> >>> M <- Matrix(0, 20, 20)
>> >>>
>> >>> > M
>> >>> #20 x 20 sparse Matrix of class "dsCMatrix"
>> >>>
>> >>> [1,] . . . . . . . . . . . . . . . . . . . .
>> >>> [2,] . . . . . . . . . . . . . . . . . . . .
>> >>> [3,] . . . . . . . . . . . . . . . . . . . .
>> >>> snip
>> >>>
>> >>> for (i in 1:nrow(ex) ) { M[ex[i, 1], ex[i, 2] ] <- ex[i, 3] }
>> >>>
>> >>> > M
>> >>> 20 x 20 sparse Matrix of class "dgCMatrix"
>> >>>
>> >>> [1,] . 14 . . . . . . . . . . . . . . . . . .
>> >>> [2,] . . . 15 . . . . . . . . . . . . . . . .
>> >>> [3,] . . . . . . . . . . . . . . . . . . . .
>> >>> snip
>> >>> >
>> >>> --
>> >>>
>> >>> David Winsemius, MD
>> >>> Heritage Laboratories
>> >>> West Hartford, CT
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>>
>> >>
>> >>
>>
PP> --
PP> View this message in context:
>> http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html
PP> Sent from the R help mailing list archive at Nabble.com.
>>
PP> ______________________________________________
PP> R-help at r-project.org mailing list
PP> https://stat.ethz.ch/mailman/listinfo/r-help
PP> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
PP> and provide commented, minimal, self-contained, reproducible code.
>>
PP> Hi Martin,<br><br>Thanks for the help. Just to make sure I understand correctly.<br><br>The below steps are for creating an example table similar to the one that I read from file.<br><br>n <- 22638<br>
PP> m <- 80914<br>
PP> nnz <- 300000 # no idea if this is realistic for you<br>
PP> <br>
PP> set.seed(101)<br>
PP> ex <- cbind(i = sample(n,nnz, replace=TRUE),<br>
PP> j = sample(m,nnz, replace=TRUE),<br>
PP> x = round(100 * rnorm(nnz)))<br>
PP> <br><br>and I can understand the way sparseMatrix is initialized right now as<br>M <- sparseMatrix(i = ex[,"i"],<br>
PP> j = ex[,"j"],<br>
PP> x = ex[,"x"])<br><br>How ever, I couldn't understand the use of below commands. <br>
PP> MM. <- tcrossprod(M) # == MM' := M %*% t(M)<br>
PP> <br>
PP> M.1 <- M %*% rep(1, ncol(M))<br>
PP> stopifnot(identical(drop(M.1), rowSums(M)))<br>
PP> <br>Kindly let me know if I missed something.<br><br>Thanks<br>Pallavi<br><br><div class="gmail_quote">On Tue, Oct 27, 2009 at 4:12 PM, Martin Maechler <span dir="ltr"><<a href="mailto:maechler at stat.math.ethz.ch" target="_blank">maechler at stat.math.ethz.ch</a>></span> wrote:<br>
PP> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
PP> <div><br>
PP> PP> Hi all,<br>
PP> <br>
PP> PP> I used sparseM package for creating sparse Matrix and<br>
PP> PP> followed below commands.<br>
PP> <br>
PP> </div>I'd strongly recommend to use package 'Matrix' which is part of<br>
PP> every R distribution (since R 2.9.0).<br>
PP> <br>
PP> PP> The sequence of commands are:<br>
PP> <br>
PP> >> ex <- read.table('fileName',sep=',')<br>
PP> >> M <- as.matrix.csr(0,22638,80914)<br>
PP> <div> >> for (i in 1:nrow(ex)) { M[ex[i,1],ex[i,2]]<-ex[i,3]}<br>
PP> <br>
PP> </div>This is very slow in either 'Matrix' or 'SparseM'<br>
PP> as soon as nrow(ex) is non-small.<br>
PP> <br>
PP> However, there are very efficient ways to construct the sparse<br>
PP> matrix directly from your 'ex' structure:<br>
PP> In 'Matrix' you should use the sparseMatrix() function as you<br>
PP> had proposed.<br>
PP> <br>
PP> Here I provide a reproducible example,<br>
PP> using a random 'ex':<br>
PP> <br>
PP> <br>
PP> n <- 22638<br>
PP> m <- 80914<br>
PP> nnz <- 300000 # no idea if this is realistic for you<br>
PP> <br>
PP> set.seed(101)<br>
PP> ex <- cbind(i = sample(n,nnz, replace=TRUE),<br>
PP> j = sample(m,nnz, replace=TRUE),<br>
PP> x = round(100 * rnorm(nnz)))<br>
PP> <br>
PP> library(Matrix)<br>
PP> <br>
PP> M <- sparseMatrix(i = ex[,"i"],<br>
PP> j = ex[,"j"],<br>
PP> x = ex[,"x"])<br>
PP> MM. <- tcrossprod(M) # == MM' := M %*% t(M)<br>
PP> <br>
PP> M.1 <- M %*% rep(1, ncol(M))<br>
PP> stopifnot(identical(drop(M.1), rowSums(M)))<br>
PP> <br>
PP> ## .... and now do other stuff with your sparse matrix M<br>
PP> <br>
PP> <br>
PP> PP> Even after 4 hours, I can still see the above command running. But, I am not<br>
PP> PP> sure whether it got stuck some where.<br>
PP> <br>
PP> PP> Also, when I initialize matrix M and try to display the values, I can see<br>
PP> PP> something like this<br>
PP> PP> [1] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br>
PP> PP> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2<br>
PP> PP> 2 2 2 2 2 2 2 2 2 2<br>
PP> PP> [85] 2 2<br>
PP> <br>
PP> PP> And, after I stopped executing above initialize command from table(after 4<br>
PP> PP> hours). I could see a different values.<br>
PP> <br>
PP> PP> Could some one kindly explain what these number are about and how can I test<br>
PP> PP> that my command is running and not just stuck some where.<br>
PP> <br>
PP> PP> Also, it would be great if some one point me to a tutorial if any on sparse<br>
PP> PP> matricies on R as I couldn't get one from internet.<br>
PP> <br>
PP> PP> Thanks<br>
PP> PP> Pallavi<br>
PP> <br>
PP> <br>
PP> <br>
PP> PP> Pallavi Palleti wrote:<br>
PP> >><br>
PP> >> Hi David,<br>
PP> >><br>
PP> >> Thanks for your help. This is exactly what I want.<br>
PP> >> But, I have number of rows of my matrix = 25k and columns size as 80k. So,<br>
PP> >> when I define a matrix object, it is throwing an error saying can not<br>
PP> >> allocate a vector of length (25K * 80k). I heard that, this data can still<br>
PP> >> be loaded into R using sparseMatrix. However, I couldn't get a syntax for<br>
PP> >> creating the same. Could someone kindly help me in this regard.<br>
PP> >><br>
PP> >> Thanks<br>
PP> >> Pallavi<br>
PP> <div><div></div><div> >><br>
PP> >><br>
PP> >> David Winsemius wrote:<br>
PP> >>><br>
PP> >>><br>
PP> >>> On Oct 26, 2009, at 5:06 AM, Pallavi Palleti wrote:<br>
PP> >>><br>
PP> >>>><br>
PP> >>>> Hi all,<br>
PP> >>>><br>
PP> >>>> I am new to R and learning the same. I would like to create a sparse<br>
PP> >>>> matrix<br>
PP> >>>> from an existing file whose contents are in the format<br>
PP> >>>> "rowIndex,columnIndex,value"<br>
PP> >>>><br>
PP> >>>> for ex:<br>
PP> >>>> 1,2,14<br>
PP> >>>> 2,4,15<br>
PP> >>>><br>
PP> >>>> I would like to create a sparse matrix by taking the above as input.<br>
PP> >>>> However, I couldn't find an example where the data was being read<br>
PP> >>>> from a<br>
PP> >>>> file. I tried searching in R tutorial and also searched for the same<br>
PP> >>>> in web<br>
PP> >>>> but in vain. Could some one kindly help me how to give the above<br>
PP> >>>> format as<br>
PP> >>>> input in R to create a sparse matrix.<br>
PP> >>><br>
PP> >>> ex <- read.table(textConnection("1,2,14<br>
PP> >>> 2,4,15") , sep=",")<br>
PP> >>> ex<br>
PP> >>> # V1 V2 V3<br>
PP> >>> #1 1 2 14<br>
PP> >>> #2 2 4 15<br>
PP> >>><br>
PP> >>> M <- Matrix(0, 20, 20)<br>
PP> >>><br>
PP> >>> > M<br>
PP> >>> #20 x 20 sparse Matrix of class "dsCMatrix"<br>
PP> >>><br>
PP> >>> [1,] . . . . . . . . . . . . . . . . . . . .<br>
PP> >>> [2,] . . . . . . . . . . . . . . . . . . . .<br>
PP> >>> [3,] . . . . . . . . . . . . . . . . . . . .<br>
PP> >>> snip<br>
PP> >>><br>
PP> >>> for (i in 1:nrow(ex) ) { M[ex[i, 1], ex[i, 2] ] <- ex[i, 3] }<br>
PP> >>><br>
PP> >>> > M<br>
PP> >>> 20 x 20 sparse Matrix of class "dgCMatrix"<br>
PP> >>><br>
PP> >>> [1,] . 14 . . . . . . . . . . . . . . . . . .<br>
PP> >>> [2,] . . . 15 . . . . . . . . . . . . . . . .<br>
PP> >>> [3,] . . . . . . . . . . . . . . . . . . . .<br>
PP> >>> snip<br>
PP> >>> ><br>
PP> >>> --<br>
PP> >>><br>
PP> >>> David Winsemius, MD<br>
PP> >>> Heritage Laboratories<br>
PP> >>> West Hartford, CT<br>
PP> >>><br>
PP> </div></div> >>> ______________________________________________<br>
PP> >>> <a href="mailto:R-help at r-project.org" target="_blank">R-help at r-project.org</a> mailing list<br>
PP> >>> <a href="https://stat.ethz.ch/mailman/listinfo/r-help" target="_blank">https://stat.ethz.ch/mailman/listinfo/r-help</a><br>
PP> >>> PLEASE do read the posting guide<br>
PP> >>> <a href="http://www.R-project.org/posting-guide.html" target="_blank">http://www.R-project.org/posting-guide.html</a><br>
PP> >>> and provide commented, minimal, self-contained, reproducible code.<br>
PP> >>><br>
PP> >>><br>
PP> >><br>
PP> >><br>
PP> <br>
PP> PP> --<br>
PP> PP> View this message in context: <a href="http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html" target="_blank">http://www.nabble.com/Creating-a-sparse-matrix-from-a-file-tp26056334p26075036.html</a><br>
PP> PP> Sent from the R help mailing list archive at Nabble.com.<br>
PP> <br>
PP> PP> ______________________________________________<br>
PP> PP> <a href="mailto:R-help at r-project.org" target="_blank">R-help at r-project.org</a> mailing list<br>
PP> PP> <a href="https://stat.ethz.ch/mailman/listinfo/r-help" target="_blank">https://stat.ethz.ch/mailman/listinfo/r-help</a><br>
PP> PP> PLEASE do read the posting guide <a href="http://www.R-project.org/posting-guide.html" target="_blank">http://www.R-project.org/posting-guide.html</a><br>
PP> PP> and provide commented, minimal, self-contained, reproducible code.<br>
PP> </blockquote></div><br>
More information about the R-help
mailing list