[R-sig-Geo] AttributeList and data.table

Matthew Dowle mdowle at concordiafunds.com
Tue Apr 18 11:26:17 CEST 2006


data.table now appears to be on CRAN.  However, given Prof Ripley's mail to
r-devel on Friday: "row.names in data.frame", it would seem data.frame
itself can be changed after all, so data.table could be removed.

> -----Original Message-----
> From: Edzer J. Pebesma [mailto:e.pebesma at geo.uu.nl] 
> Sent: 13 April 2006 20:43
> To: pedro at dpi.inpe.br
> Cc: r-sig-geo at stat.math.ethz.ch; Matthew Dowle
> Subject: Re: [R-sig-Geo] AttributeList and data.table
> 
> 
> Pedro, you're very alert! I saw it too, and had similar thoughts. 
> However, I haven't
> had any complaints yet about the way AttributeLists work 
> right now; most 
> of it
> is hidden behind the scenes anyway. If data.table usage becomes 
> widespread we
> can certainly provide coercion functions between the two. Let's first 
> wait until it
> actually hits CRAN. I'm for instance curious what happens if you pass 
> one to lm().
> --
> Edzer
> 
> pedro at dpi.inpe.br wrote:
> 
> >Hi,
> >
> >There is a quite new package on CRAN called data.table. It implements
> >the class data.table representing a data.frame without rownames, in 
> >order to improve performance. So, it has the same objective 
> of the sp 
> >class AttributeList. I confess that I'm very superficial in terms of 
> >the functionality available in both classes, but I think the 
> projects 
> >could work together, or even be merged.
> >
> >Best wishes,
> >
> >Pedro Andrade
> >
> >---------- Forwarded message ----------
> >Date: Wed, 12 Apr 2006 15:19:10 +0100
> >From: Matthew Dowle <mdowle at concordiafunds.com>
> >To: "'r-devel at r-project.org'" <r-devel at r-project.org>,
> >      "'Cran at r-project.org'" <Cran at r-project.org>
> >Subject: [Rd] New class: data.table
> >
> >
> >Hi,
> >
> >Following previous discussion on this list
> >(http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html) I have 
> >created a package as suggested, and uploaded it to CRAN incoming : 
> >data.table.tar.gz.
> >
> >** Your comments and feedback will be very much appreciated. **
> >
> >  
> >
> >>From help(data.table) :
> >>    
> >>
> >
> >This class really does very little. The only reason for its 
> existence 
> >is that the white book specifies that data.frame must have rownames.
> >
> >Most of the code is copied from base functions with the code 
> >manipulating row.names removed.
> >
> >A data.table is identical to a data.frame other than:
> >  	* it doesn't have rownames
> >  	* [,drop] by default is FALSE, so selecting a single 
> row will always 
> >return a single row data.table not a vector
> >  	* The comma is optional inside [], so DT[3] returns the 
> 3rd row as a 
> >1 row data.table
> >  	* [] is like a call to subset()
> >  	* [,...], is like a call to with().  (not yet implemented)
> >
> >Motivation:
> >  	* up to 10 times less memory
> >  	* up to 10 times faster to create, and copy
> >  	* simpler R code
> >  	* the white book defines rownames, so data.frame can't 
> be changed 
> >... => new class
> >
> >Examples:
> >nr = 1000000
> >D = rep(1:5,nr/5)
> >system.time(DF <<- data.frame(colA=D, colB=D))  # 2.08 
> system.time(DT 
> ><<- data.table(colA=D, colB=D))  # 0.15  (over 10 times faster to 
> >create) identical(as.data.table(DF), DT)
> >identical(dim(DT),dim(DF))
> >object.size(DF)/object.size(DT)                 # 10 times 
> less memory
> >
> >tt = subset(DF,colA>3)
> >ss = DT[colA>3]
> >identical(as.data.table(tt), ss)
> >
> >mean(subset(DF,colA+colB>5,"colB"))
> >mean(DT[colA+colB>5]$colB)
> >
> >tt = with(subset(DF,colA>3),colA+colB)
> >ss = with(DT[colA>3],colA+colB)                 # but could be:
> >DT[colA>3,colA+colB]  (not yet implemented)
> >identical(tt, ss)
> >
> >tt = DF[with(DF,tapply(1:nrow(DF),colB,last)),] # select last row 
> >grouping by colB
> >ss = DT[tapply(1:nrow(DT),colB,last)]           # but could be:
> >DT[last,group=colB]  (not yet implemented) 
> identical(as.data.table(tt), 
> >ss)
> >
> >Lkp=1:3
> >tt = DF[with(DF,colA %in% Lkp),]
> >ss = DT[colA %in% Lkp]                        # expressions 
> inside the []
> >can see objects in the calling frame identical(as.data.table(tt), ss)
> >
> >In each case above there is either a space, time, or code brevity 
> >advantage with the data.table.
> >
> >The motivation for the new class grew from the realization that 
> >performance of data.frames can be improved by removing the 
> rownames.  
> >See here for the previous discussion 
> >http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html.
> >
> >Regards,
> >Matthew
> >
> >______________________________________________
> >R-devel at r-project.org mailing list 
> >https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >_______________________________________________
> >R-sig-Geo mailing list
> >R-sig-Geo at stat.math.ethz.ch 
> >https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >  
> >
> 
>




More information about the R-sig-Geo mailing list