[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput
Mike Prager
mike.prager at noaa.gov
Wed Feb 28 22:06:38 CET 2007
Peter--
Thank you. Am I correct in understanding, then, that,
(1) The syntax I asked about is a special case, and the parser
and/or dget() somehow recognize it as such, and
(2) The syntax 1:15 (where 15 is the number of rows) should
work just as well as c(NA, 15)?
I ask, again, because I want to ensure the widest possible
compatibility for the way For2R is writing data in emulation of
dput().
--Mike
Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> Mike Prager wrote:
> > I am trying to understand why syntax used by dput() to write
> > rownames is valid (say, when read by dget()). I ask this
> > because I desire to emulate its actions *reliably* in my For2R
> > routines, and I won't be comfortable until I understand what R
> > is doing.
> >
> > Given data set "fred":
> >
> >
> >> fred
> >>
> > id var1
> > 1 1991 0.4388587
> > 2 1992 0.8772471
> > 3 1993 0.6230486
> > 4 1994 0.2340929
> > 5 1995 0.5005605
> >
> > we can try this--
> >
> >
> >> dput(ats, control = "all")
> >>
> > structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 =
> > c(0.4388587, 0.8772471, 0.6230486, 0.2340929, 0.5005605)),
> > .Names = c("id", "var1"), row.names = as.integer(c(NA, 5)),
> > class = "data.frame")
> >
> > In the above result, why is the following part valid?
> >
> > row.names = as.integer(c(NA, 5))
> >
> > given that the length of the RHS expression is 2, while the
> > needed length is 5.
> >
> > Moreover, the following doesn't work:
> >
> >
> >> row.names(fred) <- as.integer(c(NA,5))
> >>
> > Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) :
> > invalid 'row.names' length
> >
> > Is there any reason why the expression
> >
> > c(NA,5)
> >
> > is better here than the more natural
> >
> > 1:5
> >
> > here?
> >
> >
> It's mainly a space-saving device. Originally, row.names was a character
> vector, but storage of character vectors is quite inefficient, so we now
> allow integer names and also a very short form where 1:n is stored just
> using the single value n. To distinguish the latter two, we use the
> c(NA, n) form, because row names are not allowed to be missing.
>
> Consider the following and notice how the string row names take up
> roughly 36 bytes per record where the actual data are only 8 bytes per
> record.
>
> > d<-data.frame(x=rnorm(1000))
> > object.size(d)
> [1] 8392
> > row.names(d)<-as.character(1:1000)
> > object.size(d)
> [1] 44384
> > row.names(d)<-1000:1
> > object.size(d)
> [1] 12384
> > row.names(d)<-NULL
> > object.size(d)
> [1] 8392
>
>
>
>
> > I will appreciate help from anyone with time to reply.
> >
> > MHP
> >
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.
More information about the R-devel
mailing list