[R] How to see if row names of a dataframe are stored compactly

Hsiu-Khuern Tang hsiu-khuern.tang at hp.com
Sat Oct 14 02:36:01 CEST 2006


Hi Gabor,

* On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck (ggrothendieck at gmail.com) wrote:
> Try this:
> 
> >class(attributes(x)$row.names)
> [1] "integer"
> >rownames(x) <- as.character(rownames(x))
> >class(attributes(x)$row.names)
> [1] "character"

Yes, but this doesn't show that row.names was stored as a _single_
integer (3) instead of a vector of integers (1:3).

Reading the changes again:

    The internal storage of row.names = 1:n just records 'n', for
	    efficiency with very long vectors.

    The "row.names" attribute must be a character or integer
    vector, and this is now enforced by the C code.

I think row.names is always _printed_ as a vector.  I had misinterpreted the
help(row.names) paragraph in my original posting to mean that the internal
storage can be revealed by attributes(x, "row.names").  That paragraph implies
that attributes(x)$row.names and attr(x, "row.names") can have different
classes, but I can't create such an example.

I did this experiment:

> n <- 10000
> x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
> head(x)
  V1    V2
1  1 10001
2  2 10002
3  3 10003
4  4 10004
5  5 10005
6  6 10006
> class(attributes(x)$row.names)
[1] "integer"
> save(x, file="x1", compress=FALSE)
> row.names(x) <- 2:(n+1)
> class(attributes(x)$row.names)
[1] "integer"
> save(x, file="x2", compress=FALSE)
> subset(file.info(c("x1", "x2")), select=size)
     size
x1  80205
x2 120197

The difference in size is about nrow(x) * 4 bytes.  I think this shows that 1:n
was stored compactly as a single integer but 2:(n+1) was not.

> On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang at hp.com> wrote:
> >Reading the list of changes for R version 2.4.0, I was happy to see that 
> >the
> >row names of dataframes can be stored compactly (as the integer n when
> >row.names(df) is 1:n).
> >
> >help(row.names) contains this paragraph:
> >
> >   Row names of the form '1:n' for 'n > 2' are stored internally in a
> >   compact form, which might be seen by calling 'attributes' but never
> >   via 'row.names' or 'attr(x, "row.names")'.
> >
> >I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> >misreading the documentation?  Does "might be seen" mean "possibly in some
> >future version of R" in this case?
> >
> >> (x <- as.data.frame(matrix(1:9, nrow=3)))
> > V1 V2 V3
> >1  1  4  7
> >2  2  5  8
> >3  3  6  9
> >> attributes(x)$row.names
> >[1] 1 2 3
> >> row.names(x) <- seq(len=nrow(x))
> >> attributes(x)$row.names
> >[1] 1 2 3
> >
> >Best,
> >Hsiu-Khuern.
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide 
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >

Best,
Hsiu-Khuern.



More information about the R-help mailing list