[R] What exactly is an dgCMatrix-class. There are so many attributes.

C W tmrsg11 at gmail.com
Fri Oct 20 22:01:06 CEST 2017


Subsetting using [] vs. head(), gives different results.

R code:

> head(train$data, 5)
[1] 0 0 1 0 0

> train$data[1:5, 1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
     cap-shape=bell cap-shape=conical cap-shape=convex
[1,]              .                 .                1
[2,]              .                 .                1
[3,]              1                 .                .
[4,]              .                 .                1
[5,]              .                 .                1
     cap-shape=flat cap-shape=knobbed
[1,]              .                 .
[2,]              .                 .
[3,]              .                 .
[4,]              .                 .
[5,]              .                 .

On Fri, Oct 20, 2017 at 3:51 PM, C W <tmrsg11 at gmail.com> wrote:

> Thank you for your responses.
>
> I guess I don't feel alone. I don't find the documentation go into any
> detail.
>
> I also find it surprising that,
>
> > object.size(train$data)
> 1730904 bytes
>
> > object.size(as.matrix(train$data))
> 6575016 bytes
>
> the dgCMatrix actually takes less memory, though it *looks* like the
> opposite.
>
> Cheers!
>
> On Fri, Oct 20, 2017 at 3:22 PM, David Winsemius <dwinsemius at comcast.net>
> wrote:
>
>>
>> > On Oct 20, 2017, at 11:11 AM, C W <tmrsg11 at gmail.com> wrote:
>> >
>> > Dear R list,
>> >
>> > I came across dgCMatrix. I believe this class is associated with sparse
>> > matrix.
>>
>> Yes. See:
>>
>>  help('dgCMatrix-class', pack=Matrix)
>>
>> If Martin Maechler happens to respond to this you should listen to him
>> rather than anything I write. Much of what the Matrix package does appears
>> to be magical to one such as I.
>>
>> >
>> > I see there are 8 attributes to train$data, I am confused why are there
>> so
>> > many, some are vectors, what do they do?
>> >
>> > Here's the R code:
>> >
>> > library(xgboost)
>> > data(agaricus.train, package='xgboost')
>> > data(agaricus.test, package='xgboost')
>> > train <- agaricus.train
>> > test <- agaricus.test
>> > attributes(train$data)
>> >
>>
>> I got a bit of an annoying surprise when I did something similar. It
>> appearred to me that I did not need to load the xgboost library since all
>> that was being asked was "where is the data" in an object that should be
>> loaded from that library using the `data` function. The last command asking
>> for the attributes filled up my console with a 100K length vector (actually
>> 2 of such vectors). The `str` function returns a more useful result.
>>
>> > data(agaricus.train, package='xgboost')
>> > train <- agaricus.train
>> > names( attributes(train$data) )
>> [1] "i"        "p"        "Dim"      "Dimnames" "x"        "factors"
>> "class"
>> > str(train$data)
>> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
>>   ..@ i       : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ...
>>   ..@ p       : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991
>> ...
>>   ..@ Dim     : int [1:2] 6513 126
>>   ..@ Dimnames:List of 2
>>   .. ..$ : NULL
>>   .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical"
>> "cap-shape=convex" "cap-shape=flat" ...
>>   ..@ x       : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ...
>>   ..@ factors : list()
>>
>> > Where is the data, is it in $p, $i, or $x?
>>
>> So the "data" (meaning the values of the sparse matrix) are in the @x
>> leaf. The values all appear to be the number 1. The @i leaf is the sequence
>> of row locations for the values entries while the @p items are somehow
>> connected with the columns (I think, since 127 and 126=number of columns
>> from the @Dim leaf are only off by 1).
>>
>> Doing this > colSums(as.matrix(train$data))
>>                   cap-shape=bell                cap-shape=conical
>>                              369                                3
>>                 cap-shape=convex                   cap-shape=flat
>>                             2934                             2539
>>                cap-shape=knobbed                 cap-shape=sunken
>>                              644                               24
>>              cap-surface=fibrous              cap-surface=grooves
>>                             1867                                4
>>                cap-surface=scaly               cap-surface=smooth
>>                             2607                             2035
>>                  cap-color=brown                   cap-color=buff
>>                             1816
>> # now snipping the rest of that output.
>>
>>
>>
>> Now this makes me think that the @p vector gives you the cumulative sum
>> of number of items per column:
>>
>> > all( cumsum( colSums(as.matrix(train$data)) ) == train$data at p[-1] )
>> [1] TRUE
>>
>> >
>> > Thank you very much!
>> >
>> >       [[alternative HTML version deleted]]
>>
>> Please read the Posting Guide. Your code was not mangled in this
>> instance, but HTML code often arrives in an unreadable mess.
>>
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> 'Any technology distinguishable from magic is insufficiently advanced.'
>>  -Gehm's Corollary to Clarke's Third Law
>>
>>
>>
>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list