[R-sig-Geo] AttributeList and data.table

Edzer J. Pebesma e.pebesma at geo.uu.nl
Thu Apr 13 21:43:17 CEST 2006


Pedro, you're very alert! I saw it too, and had similar thoughts. 
However, I haven't
had any complaints yet about the way AttributeLists work right now; most 
of it
is hidden behind the scenes anyway. If data.table usage becomes 
widespread we
can certainly provide coercion functions between the two. Let's first 
wait until it
actually hits CRAN. I'm for instance curious what happens if you pass 
one to lm().
--
Edzer

pedro at dpi.inpe.br wrote:

>Hi,
>
>There is a quite new package on CRAN called data.table. It implements 
>the class data.table representing a data.frame without rownames, in 
>order to improve performance. So, it has the same objective of the sp 
>class AttributeList. I confess that I'm very superficial in terms of 
>the functionality available in both classes, but I think the projects 
>could work together, or even be merged.
>
>Best wishes,
>
>Pedro Andrade
>
>---------- Forwarded message ----------
>Date: Wed, 12 Apr 2006 15:19:10 +0100
>From: Matthew Dowle <mdowle at concordiafunds.com>
>To: "'r-devel at r-project.org'" <r-devel at r-project.org>,
>      "'Cran at r-project.org'" <Cran at r-project.org>
>Subject: [Rd] New class: data.table
>
>
>Hi,
>
>Following previous discussion on this list
>(http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html) I have created a
>package as suggested, and uploaded it to CRAN incoming : data.table.tar.gz.
>
>** Your comments and feedback will be very much appreciated. **
>
>  
>
>>From help(data.table) :
>>    
>>
>
>This class really does very little. The only reason for its existence is
>that the white book specifies that data.frame must have rownames.
>
>Most of the code is copied from base functions with the code manipulating
>row.names removed.
>
>A data.table is identical to a data.frame other than:
>  	* it doesn't have rownames
>  	* [,drop] by default is FALSE, so selecting a single row will always
>return a single row data.table not a vector
>  	* The comma is optional inside [], so DT[3] returns the 3rd row as a
>1 row data.table
>  	* [] is like a call to subset()
>  	* [,...], is like a call to with().  (not yet implemented)
>
>Motivation:
>  	* up to 10 times less memory
>  	* up to 10 times faster to create, and copy
>  	* simpler R code
>  	* the white book defines rownames, so data.frame can't be changed
>... => new class
>
>Examples:
>nr = 1000000
>D = rep(1:5,nr/5)
>system.time(DF <<- data.frame(colA=D, colB=D))  # 2.08
>system.time(DT <<- data.table(colA=D, colB=D))  # 0.15  (over 10 times
>faster to create)
>identical(as.data.table(DF), DT)
>identical(dim(DT),dim(DF))
>object.size(DF)/object.size(DT)                 # 10 times less memory
>
>tt = subset(DF,colA>3)
>ss = DT[colA>3]
>identical(as.data.table(tt), ss)
>
>mean(subset(DF,colA+colB>5,"colB"))
>mean(DT[colA+colB>5]$colB)
>
>tt = with(subset(DF,colA>3),colA+colB)
>ss = with(DT[colA>3],colA+colB)                 # but could be:
>DT[colA>3,colA+colB]  (not yet implemented)
>identical(tt, ss)
>
>tt = DF[with(DF,tapply(1:nrow(DF),colB,last)),] # select last row grouping
>by colB
>ss = DT[tapply(1:nrow(DT),colB,last)]           # but could be:
>DT[last,group=colB]  (not yet implemented)
>identical(as.data.table(tt), ss)
>
>Lkp=1:3
>tt = DF[with(DF,colA %in% Lkp),]
>ss = DT[colA %in% Lkp]                        # expressions inside the []
>can see objects in the calling frame
>identical(as.data.table(tt), ss)
>
>In each case above there is either a space, time, or code brevity advantage
>with the data.table.
>
>The motivation for the new class grew from the realization that performance
>of data.frames can be improved by removing the rownames.  See here for the
>previous discussion
>http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html.
>
>Regards,
>Matthew
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
>
>_______________________________________________
>R-sig-Geo mailing list
>R-sig-Geo at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>  
>




More information about the R-sig-Geo mailing list