[R] What exactly is an dgCMatrix-class. There are so many attributes.
C W
tmrsg11 at gmail.com
Fri Oct 20 21:51:16 CEST 2017
Thank you for your responses.
I guess I don't feel alone. I don't find the documentation go into any
detail.
I also find it surprising that,
> object.size(train$data)
1730904 bytes
> object.size(as.matrix(train$data))
6575016 bytes
the dgCMatrix actually takes less memory, though it *looks* like the
opposite.
Cheers!
On Fri, Oct 20, 2017 at 3:22 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> > On Oct 20, 2017, at 11:11 AM, C W <tmrsg11 at gmail.com> wrote:
> >
> > Dear R list,
> >
> > I came across dgCMatrix. I believe this class is associated with sparse
> > matrix.
>
> Yes. See:
>
> help('dgCMatrix-class', pack=Matrix)
>
> If Martin Maechler happens to respond to this you should listen to him
> rather than anything I write. Much of what the Matrix package does appears
> to be magical to one such as I.
>
> >
> > I see there are 8 attributes to train$data, I am confused why are there
> so
> > many, some are vectors, what do they do?
> >
> > Here's the R code:
> >
> > library(xgboost)
> > data(agaricus.train, package='xgboost')
> > data(agaricus.test, package='xgboost')
> > train <- agaricus.train
> > test <- agaricus.test
> > attributes(train$data)
> >
>
> I got a bit of an annoying surprise when I did something similar. It
> appearred to me that I did not need to load the xgboost library since all
> that was being asked was "where is the data" in an object that should be
> loaded from that library using the `data` function. The last command asking
> for the attributes filled up my console with a 100K length vector (actually
> 2 of such vectors). The `str` function returns a more useful result.
>
> > data(agaricus.train, package='xgboost')
> > train <- agaricus.train
> > names( attributes(train$data) )
> [1] "i" "p" "Dim" "Dimnames" "x" "factors"
> "class"
> > str(train$data)
> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ..@ i : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ...
> ..@ p : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991
> ...
> ..@ Dim : int [1:2] 6513 126
> ..@ Dimnames:List of 2
> .. ..$ : NULL
> .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical"
> "cap-shape=convex" "cap-shape=flat" ...
> ..@ x : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ...
> ..@ factors : list()
>
> > Where is the data, is it in $p, $i, or $x?
>
> So the "data" (meaning the values of the sparse matrix) are in the @x
> leaf. The values all appear to be the number 1. The @i leaf is the sequence
> of row locations for the values entries while the @p items are somehow
> connected with the columns (I think, since 127 and 126=number of columns
> from the @Dim leaf are only off by 1).
>
> Doing this > colSums(as.matrix(train$data))
> cap-shape=bell cap-shape=conical
> 369 3
> cap-shape=convex cap-shape=flat
> 2934 2539
> cap-shape=knobbed cap-shape=sunken
> 644 24
> cap-surface=fibrous cap-surface=grooves
> 1867 4
> cap-surface=scaly cap-surface=smooth
> 2607 2035
> cap-color=brown cap-color=buff
> 1816
> # now snipping the rest of that output.
>
>
>
> Now this makes me think that the @p vector gives you the cumulative sum of
> number of items per column:
>
> > all( cumsum( colSums(as.matrix(train$data)) ) == train$data at p[-1] )
> [1] TRUE
>
> >
> > Thank you very much!
> >
> > [[alternative HTML version deleted]]
>
> Please read the Posting Guide. Your code was not mangled in this instance,
> but HTML code often arrives in an unreadable mess.
>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
> -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list