[R] Creating a sparse matrix from a file
Martin Maechler
maechler at stat.math.ethz.ch
Wed Oct 28 14:52:52 CET 2009
>>>>> "PP" == Pallavi P <pallavip.05 at gmail.com>
>>>>> on Wed, 28 Oct 2009 16:30:25 +0530 writes:
PP> Hi Martin,
PP> I followed your example on my set of data. Which has non zero values in
PP> 300k positions in 22638 X 80914 sparse matrix. I am able to load data into a
PP> field and was able to do some operations (essentially t(m) %*% m). However,
PP> when I tried to display the value in the resulted matrix. I am getting below
PP> error
PP> *
PP> Error in asMethod(object) :
PP> Cholmod error 'out of memory' at file:../Core/cholmod_memory.c, line 148*
PP> The sequence of commands I used are:
>> uac=read.table('C:\\personal\\code\\data\\user_album_count.csv',sep=',' ,
PP> header=T)
>> library(Matrix)
>> m<-sparseMatrix(i=uac[,"user"],j=uac[,"item"],x=uac[,"count"])
>> cm<-t(m) %*% m
The above is less efficient than
cm <- crossprod(m)
please use the latter {not just for sparse matrices; for all
matrices in R !}
PP> upto this point, I was able to run, however when I tried to display cm[1,1],
PP> I got above error. Kindly let me know if there is anything wrong going on
PP> here.
Interestingly, we had a recent thread on R-devel,
which also made a point about excessive memory usage when
accessing elements of a sparse matrix.
I'd really like to investigate further;
but can you ***PLEASE*** use reproducible code, i.e.,
similar to the one I used, rather than reading data from one of
your files.
Note that your matrix is still fine and should be able to work
with it, even thoug it seems the operation
a <- cm[1,1]
is currently implemented very sub-optimally.
I'm busy for the rest of today with other duties,
but am looking forward to receive **reproducible** code from
you, by tonight.
Also, please do not forget to also show the result of
sessionInfo() !
Martin Maechler,
PP> Thanks
PP> Pallavi
PP> On Tue, Oct 27, 2009 at 8:34 PM, Martin Maechler <maechler at stat.math.ethz.ch
>> wrote:
>> >>>>> "PP" == Pallavi P <pallavip.05 at gmail.com>
>> >>>>> on Tue, 27 Oct 2009 18:13:22 +0530 writes:
>>
PP> Hi Martin,
PP> Thanks for the help. Just to make sure I understand correctly.
>>
PP> The below steps are for creating an example table similar to the one
>> that I
PP> read from file.
>>
>> yes, exactly
>>
>> n <- 22638
>> m <- 80914
>> nnz <- 300000 # no idea if this is realistic for you
>>
>> set.seed(101)
>> ex <- cbind(i = sample(n,nnz, replace=TRUE),
>> j = sample(m,nnz, replace=TRUE),
>> x = round(100 * rnorm(nnz)))
>>
>>
PP> and I can understand the way sparseMatrix is initialized right now
>> as
>> M <- sparseMatrix(i = ex[,"i"],
>> j = ex[,"j"],
>> x = ex[,"x"])
>>
PP> How ever, I couldn't understand the use of below commands.
>>
>> MM. <- tcrossprod(M) # == MM' := M %*% t(M)
>> M.1 <- M %*% rep(1, ncol(M))
>> stopifnot(identical(drop(M.1), rowSums(M)))
>>
>> They were just for illustrative purposes,
>> to show how and that you can work with the created sparse matrix
>> 'M'.
>>
>> Regards,
>> Martin Maechler, ETH Zurich
>>
PP> Kindly let me know if I missed something.
>>
PP> Thanks
PP> Pallavi
>>
>>
PP> Hi Martin,<br><br>I followed your example on my set of data. Which has non zero values in 300k positions in 22638 X 80914 sparse matrix. I am able to load data into a field and was able to do some operations (essentially t(m) %*% m). However, when I tried to display the value in the resulted matrix. I am getting below error<br>
PP> <b><br>Error in asMethod(object) : <br> Cholmod error 'out of memory' at file:../Core/cholmod_memory.c, line 148</b><br><br>The sequence of commands I used are:<br><br>>uac=read.table('C:\\personal\\code\\data\\user_album_count.csv',sep=',' , header=T)<br>
PP> >library(Matrix)<br>>m<-sparseMatrix(i=uac[,"user"],j=uac[,"item"],x=uac[,"count"])<br>>cm<-t(m) %*% m<br>upto this point, I was able to run, however when I tried to display cm[1,1], I got above error. Kindly let me know if there is anything wrong going on here.<br>
PP> <br>Thanks<br>Pallavi<br><br><div class="gmail_quote">On Tue, Oct 27, 2009 at 8:34 PM, Martin Maechler <span dir="ltr"><<a href="mailto:maechler at stat.math.ethz.ch">maechler at stat.math.ethz.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
PP> >>>>> "PP" == Pallavi P <<a href="mailto:pallavip.05 at gmail.com">pallavip.05 at gmail.com</a>><br>
PP> >>>>> on Tue, 27 Oct 2009 18:13:22 +0530 writes:<br>
PP> <br>
PP> PP> Hi Martin,<br>
PP> PP> Thanks for the help. Just to make sure I understand correctly.<br>
PP> <br>
PP> PP> The below steps are for creating an example table similar to the one that I<br>
PP> PP> read from file.<br>
PP> <br>
PP> yes, exactly<br>
PP> <div class="im"><br>
PP> n <- 22638<br>
PP> m <- 80914<br>
PP> nnz <- 300000 # no idea if this is realistic for you<br>
PP> <br>
PP> set.seed(101)<br>
PP> ex <- cbind(i = sample(n,nnz, replace=TRUE),<br>
PP> j = sample(m,nnz, replace=TRUE),<br>
PP> x = round(100 * rnorm(nnz)))<br>
PP> <br>
PP> <br>
PP> </div> PP> and I can understand the way sparseMatrix is initialized right now as<br>
PP> <div class="im"> M <- sparseMatrix(i = ex[,"i"],<br>
PP> j = ex[,"j"],<br>
PP> x = ex[,"x"])<br>
PP> <br>
PP> </div> PP> How ever, I couldn't understand the use of below commands.<br>
PP> <div class="im"><br>
PP> MM. <- tcrossprod(M) # == MM' := M %*% t(M)<br>
PP> M.1 <- M %*% rep(1, ncol(M))<br>
PP> stopifnot(identical(drop(M.1), rowSums(M)))<br>
PP> <br>
PP> </div>They were just for illustrative purposes,<br>
PP> to show how and that you can work with the created sparse matrix<br>
PP> 'M'.<br>
PP> <br>
PP> Regards,<br>
PP> Martin Maechler, ETH Zurich<br>
PP> <br>
PP> PP> Kindly let me know if I missed something.<br>
PP> <br>
PP> PP> Thanks<br>
PP> PP> Pallavi<br>
PP> <br></blockquote></div><br>
More information about the R-help
mailing list