[R-sig-Geo] AttributeList and data.table
Matthew Dowle
mdowle at concordiafunds.com
Tue Apr 18 11:26:17 CEST 2006
data.table now appears to be on CRAN. However, given Prof Ripley's mail to
r-devel on Friday: "row.names in data.frame", it would seem data.frame
itself can be changed after all, so data.table could be removed.
> -----Original Message-----
> From: Edzer J. Pebesma [mailto:e.pebesma at geo.uu.nl]
> Sent: 13 April 2006 20:43
> To: pedro at dpi.inpe.br
> Cc: r-sig-geo at stat.math.ethz.ch; Matthew Dowle
> Subject: Re: [R-sig-Geo] AttributeList and data.table
>
>
> Pedro, you're very alert! I saw it too, and had similar thoughts.
> However, I haven't
> had any complaints yet about the way AttributeLists work
> right now; most
> of it
> is hidden behind the scenes anyway. If data.table usage becomes
> widespread we
> can certainly provide coercion functions between the two. Let's first
> wait until it
> actually hits CRAN. I'm for instance curious what happens if you pass
> one to lm().
> --
> Edzer
>
> pedro at dpi.inpe.br wrote:
>
> >Hi,
> >
> >There is a quite new package on CRAN called data.table. It implements
> >the class data.table representing a data.frame without rownames, in
> >order to improve performance. So, it has the same objective
> of the sp
> >class AttributeList. I confess that I'm very superficial in terms of
> >the functionality available in both classes, but I think the
> projects
> >could work together, or even be merged.
> >
> >Best wishes,
> >
> >Pedro Andrade
> >
> >---------- Forwarded message ----------
> >Date: Wed, 12 Apr 2006 15:19:10 +0100
> >From: Matthew Dowle <mdowle at concordiafunds.com>
> >To: "'r-devel at r-project.org'" <r-devel at r-project.org>,
> > "'Cran at r-project.org'" <Cran at r-project.org>
> >Subject: [Rd] New class: data.table
> >
> >
> >Hi,
> >
> >Following previous discussion on this list
> >(http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html) I have
> >created a package as suggested, and uploaded it to CRAN incoming :
> >data.table.tar.gz.
> >
> >** Your comments and feedback will be very much appreciated. **
> >
> >
> >
> >>From help(data.table) :
> >>
> >>
> >
> >This class really does very little. The only reason for its
> existence
> >is that the white book specifies that data.frame must have rownames.
> >
> >Most of the code is copied from base functions with the code
> >manipulating row.names removed.
> >
> >A data.table is identical to a data.frame other than:
> > * it doesn't have rownames
> > * [,drop] by default is FALSE, so selecting a single
> row will always
> >return a single row data.table not a vector
> > * The comma is optional inside [], so DT[3] returns the
> 3rd row as a
> >1 row data.table
> > * [] is like a call to subset()
> > * [,...], is like a call to with(). (not yet implemented)
> >
> >Motivation:
> > * up to 10 times less memory
> > * up to 10 times faster to create, and copy
> > * simpler R code
> > * the white book defines rownames, so data.frame can't
> be changed
> >... => new class
> >
> >Examples:
> >nr = 1000000
> >D = rep(1:5,nr/5)
> >system.time(DF <<- data.frame(colA=D, colB=D)) # 2.08
> system.time(DT
> ><<- data.table(colA=D, colB=D)) # 0.15 (over 10 times faster to
> >create) identical(as.data.table(DF), DT)
> >identical(dim(DT),dim(DF))
> >object.size(DF)/object.size(DT) # 10 times
> less memory
> >
> >tt = subset(DF,colA>3)
> >ss = DT[colA>3]
> >identical(as.data.table(tt), ss)
> >
> >mean(subset(DF,colA+colB>5,"colB"))
> >mean(DT[colA+colB>5]$colB)
> >
> >tt = with(subset(DF,colA>3),colA+colB)
> >ss = with(DT[colA>3],colA+colB) # but could be:
> >DT[colA>3,colA+colB] (not yet implemented)
> >identical(tt, ss)
> >
> >tt = DF[with(DF,tapply(1:nrow(DF),colB,last)),] # select last row
> >grouping by colB
> >ss = DT[tapply(1:nrow(DT),colB,last)] # but could be:
> >DT[last,group=colB] (not yet implemented)
> identical(as.data.table(tt),
> >ss)
> >
> >Lkp=1:3
> >tt = DF[with(DF,colA %in% Lkp),]
> >ss = DT[colA %in% Lkp] # expressions
> inside the []
> >can see objects in the calling frame identical(as.data.table(tt), ss)
> >
> >In each case above there is either a space, time, or code brevity
> >advantage with the data.table.
> >
> >The motivation for the new class grew from the realization that
> >performance of data.frames can be improved by removing the
> rownames.
> >See here for the previous discussion
> >http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html.
> >
> >Regards,
> >Matthew
> >
> >______________________________________________
> >R-devel at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >_______________________________________________
> >R-sig-Geo mailing list
> >R-sig-Geo at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
>
>
More information about the R-sig-Geo
mailing list