[R] How to see if row names of a dataframe are stored compactly
Hsiu-Khuern Tang
hsiu-khuern.tang at hp.com
Sat Oct 14 07:58:14 CEST 2006
* On Fri 10:14PM, 13 Oct 2006, jim holtman (jholtman at gmail.com) wrote:
> Take a look with 'dput' and you will see the difference:
>
> >row.names(x) <- 1:n
> >dput(x)
> structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
> 12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names =
> c(NA,
> 10), class = "data.frame")
> >row.names(x) <- 2:(n+1)
> >dput(x)
> structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
> 12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names =
> c(2,
> 3, 4, 5, 6, 7, 8, 9, 10, 11), class = "data.frame")
> >
>
> 'row.names' is different.
So it is! Thank you! This also explains why the n x 2 dataframe became
larger by exactly (n-2) * 4 bytes when row.names changed from 1:n to
2:(n+1).
> On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang at hp.com> wrote:
> >Hi Gabor,
> >
> >* On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck
> >(ggrothendieck at gmail.com) wrote:
> >> Try this:
> >>
> >> >class(attributes(x)$row.names)
> >> [1] "integer"
> >> >rownames(x) <- as.character(rownames(x))
> >> >class(attributes(x)$row.names)
> >> [1] "character"
> >
> >Yes, but this doesn't show that row.names was stored as a _single_
> >integer (3) instead of a vector of integers (1:3).
> >
> >Reading the changes again:
> >
> > The internal storage of row.names = 1:n just records 'n', for
> > efficiency with very long vectors.
> >
> > The "row.names" attribute must be a character or integer
> > vector, and this is now enforced by the C code.
> >
> >I think row.names is always _printed_ as a vector. I had misinterpreted
> >the
> >help(row.names) paragraph in my original posting to mean that the internal
> >storage can be revealed by attributes(x, "row.names"). That paragraph
> >implies
> >that attributes(x)$row.names and attr(x, "row.names") can have different
> >classes, but I can't create such an example.
> >
> >I did this experiment:
> >
> >> n <- 10000
> >> x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
> >> head(x)
> > V1 V2
> >1 1 10001
> >2 2 10002
> >3 3 10003
> >4 4 10004
> >5 5 10005
> >6 6 10006
> >> class(attributes(x)$row.names)
> >[1] "integer"
> >> save(x, file="x1", compress=FALSE)
> >> row.names(x) <- 2:(n+1)
> >> class(attributes(x)$row.names)
> >[1] "integer"
> >> save(x, file="x2", compress=FALSE)
> >> subset(file.info(c("x1", "x2")), select=size)
> > size
> >x1 80205
> >x2 120197
> >
> >The difference in size is about nrow(x) * 4 bytes. I think this shows
> >that 1:n
> >was stored compactly as a single integer but 2:(n+1) was not.
> >
> >> On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang at hp.com> wrote:
> >> >Reading the list of changes for R version 2.4.0, I was happy to see that
> >> >the
> >> >row names of dataframes can be stored compactly (as the integer n when
> >> >row.names(df) is 1:n).
> >> >
> >> >help(row.names) contains this paragraph:
> >> >
> >> > Row names of the form '1:n' for 'n > 2' are stored internally in a
> >> > compact form, which might be seen by calling 'attributes' but never
> >> > via 'row.names' or 'attr(x, "row.names")'.
> >> >
> >> >I am unable to get attributes(x)$row.names to return just nrow(x). Am I
> >> >misreading the documentation? Does "might be seen" mean "possibly in
> >some
> >> >future version of R" in this case?
> >> >
> >> >> (x <- as.data.frame(matrix(1:9, nrow=3)))
> >> > V1 V2 V3
> >> >1 1 4 7
> >> >2 2 5 8
> >> >3 3 6 9
> >> >> attributes(x)$row.names
> >> >[1] 1 2 3
> >> >> row.names(x) <- seq(len=nrow(x))
> >> >> attributes(x)$row.names
> >> >[1] 1 2 3
> >> >
> >> >Best,
> >> >Hsiu-Khuern.
> >> >
> >> >______________________________________________
> >> >R-help at stat.math.ethz.ch mailing list
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> >PLEASE do read the posting guide
> >> >http://www.R-project.org/posting-guide.html
> >> >and provide commented, minimal, self-contained, reproducible code.
> >> >
> >
> >Best,
> >Hsiu-Khuern.
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
Best,
Hsiu-Khuern.
More information about the R-help
mailing list