[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput

Mike Prager mike.prager at noaa.gov
Wed Feb 28 22:06:38 CET 2007


Peter--

Thank you.  Am I correct in understanding, then, that,

(1) The syntax I asked about is a special case, and the parser
and/or dget() somehow recognize it as such, and

(2) The syntax 1:15 (where 15 is the number of rows)  should
work just as well as c(NA, 15)?

I ask, again, because I want to ensure the widest possible
compatibility for the way For2R is writing data in emulation of
dput().

--Mike


Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:

> Mike Prager wrote:
> > I am trying to understand why syntax used by dput() to write
> > rownames is valid (say, when read by dget()).  I ask this
> > because I desire to emulate its actions *reliably* in my For2R
> > routines, and I won't be comfortable until I understand what R
> > is doing.
> >
> > Given data set "fred":
> >
> >   
> >> fred
> >>     
> >     id      var1
> > 1 1991 0.4388587
> > 2 1992 0.8772471
> > 3 1993 0.6230486
> > 4 1994 0.2340929
> > 5 1995 0.5005605
> >
> > we can try this--
> >
> >   
> >> dput(ats, control = "all")
> >>     
> > structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 =
> > c(0.4388587, 0.8772471, 0.6230486, 0.2340929, 0.5005605)),
> > .Names = c("id", "var1"), row.names = as.integer(c(NA, 5)),
> > class = "data.frame")
> >
> > In the above result, why is the following part valid?
> >
> > row.names = as.integer(c(NA, 5))
> >
> > given that the length of the RHS expression is 2, while the
> > needed length is 5.
> >
> > Moreover, the following doesn't work:
> >
> >   
> >> row.names(fred) <- as.integer(c(NA,5))
> >>     
> > Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : 
> >         invalid 'row.names' length
> >
> > Is there any reason why the expression
> >
> > c(NA,5) 
> >
> > is better here than the more natural
> >
> > 1:5 
> >
> > here?
> >
> >   
> It's mainly a space-saving device. Originally, row.names was a character 
> vector, but storage of character vectors is quite inefficient, so we now 
> allow integer names and also a very short form where 1:n is stored just 
> using the single value n. To distinguish the latter two, we use the 
> c(NA, n) form, because row names are not allowed to be missing.
> 
> Consider the following and notice how the string row names take up 
> roughly 36 bytes per  record where the actual data are only 8 bytes per 
> record.
> 
>  > d<-data.frame(x=rnorm(1000))
>  > object.size(d)
> [1] 8392
>  > row.names(d)<-as.character(1:1000)
>  > object.size(d)
> [1] 44384
>  > row.names(d)<-1000:1
>  > object.size(d)
> [1] 12384
>  > row.names(d)<-NULL
>  > object.size(d)
> [1] 8392
> 
> 
> 
> 
> > I will appreciate help from anyone with time to reply.
> >
> > MHP
> >
> >
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.



More information about the R-devel mailing list