[R-sig-Geo] AttributeList and data.table

Roger Bivand Roger.Bivand at nhh.no
Thu Apr 13 21:26:22 CEST 2006


On Thu, 13 Apr 2006, Edzer J. Pebesma wrote:

> Pedro, you're very alert! I saw it too, and had similar thoughts. 
> However, I haven't
> had any complaints yet about the way AttributeLists work right now; most 
> of it
> is hidden behind the scenes anyway. If data.table usage becomes 
> widespread we
> can certainly provide coercion functions between the two. Let's first 
> wait until it
> actually hits CRAN. I'm for instance curious what happens if you pass 
> one to lm().

I agree. If it starts gaining momentum, it would save mainenance to pool 
efforts, but data.frame objects are very prevalent in modelling code. But:

> library(sp)
> x <- runif(5000)
> y <- runif(5000)
> z <- rnorm(5000)
> e <- rnorm(5000,0,0.1)
> ze <- 0.3 - 0.1*z + e
> al <- AttributeList(list(z=z, ze=ze))
> spdf <- SpatialPointsDataFrame(cbind(x,y), data=al)
> lm(ze ~ z, spdf)

Call:
lm(formula = ze ~ z, data = spdf)

Coefficients:
(Intercept)            z  
    0.30141     -0.09917  

does very nicely already. Nice work, that AttributeList object!

Roger

> --
> Edzer
> 
> pedro at dpi.inpe.br wrote:
> 
> >Hi,
> >
> >There is a quite new package on CRAN called data.table. It implements 
> >the class data.table representing a data.frame without rownames, in 
> >order to improve performance. So, it has the same objective of the sp 
> >class AttributeList. I confess that I'm very superficial in terms of 
> >the functionality available in both classes, but I think the projects 
> >could work together, or even be merged.
> >
> >Best wishes,
> >
> >Pedro Andrade
> >
> >---------- Forwarded message ----------
> >Date: Wed, 12 Apr 2006 15:19:10 +0100
> >From: Matthew Dowle <mdowle at concordiafunds.com>
> >To: "'r-devel at r-project.org'" <r-devel at r-project.org>,
> >      "'Cran at r-project.org'" <Cran at r-project.org>
> >Subject: [Rd] New class: data.table
> >
> >
> >Hi,
> >
> >Following previous discussion on this list
> >(http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html) I have created a
> >package as suggested, and uploaded it to CRAN incoming : data.table.tar.gz.
> >
> >** Your comments and feedback will be very much appreciated. **
> >
> >  
> >
> >>From help(data.table) :
> >>    
> >>
> >
> >This class really does very little. The only reason for its existence is
> >that the white book specifies that data.frame must have rownames.
> >
> >Most of the code is copied from base functions with the code manipulating
> >row.names removed.
> >
> >A data.table is identical to a data.frame other than:
> >  	* it doesn't have rownames
> >  	* [,drop] by default is FALSE, so selecting a single row will always
> >return a single row data.table not a vector
> >  	* The comma is optional inside [], so DT[3] returns the 3rd row as a
> >1 row data.table
> >  	* [] is like a call to subset()
> >  	* [,...], is like a call to with().  (not yet implemented)
> >
> >Motivation:
> >  	* up to 10 times less memory
> >  	* up to 10 times faster to create, and copy
> >  	* simpler R code
> >  	* the white book defines rownames, so data.frame can't be changed
> >... => new class
> >
> >Examples:
> >nr = 1000000
> >D = rep(1:5,nr/5)
> >system.time(DF <<- data.frame(colA=D, colB=D))  # 2.08
> >system.time(DT <<- data.table(colA=D, colB=D))  # 0.15  (over 10 times
> >faster to create)
> >identical(as.data.table(DF), DT)
> >identical(dim(DT),dim(DF))
> >object.size(DF)/object.size(DT)                 # 10 times less memory
> >
> >tt = subset(DF,colA>3)
> >ss = DT[colA>3]
> >identical(as.data.table(tt), ss)
> >
> >mean(subset(DF,colA+colB>5,"colB"))
> >mean(DT[colA+colB>5]$colB)
> >
> >tt = with(subset(DF,colA>3),colA+colB)
> >ss = with(DT[colA>3],colA+colB)                 # but could be:
> >DT[colA>3,colA+colB]  (not yet implemented)
> >identical(tt, ss)
> >
> >tt = DF[with(DF,tapply(1:nrow(DF),colB,last)),] # select last row grouping
> >by colB
> >ss = DT[tapply(1:nrow(DT),colB,last)]           # but could be:
> >DT[last,group=colB]  (not yet implemented)
> >identical(as.data.table(tt), ss)
> >
> >Lkp=1:3
> >tt = DF[with(DF,colA %in% Lkp),]
> >ss = DT[colA %in% Lkp]                        # expressions inside the []
> >can see objects in the calling frame
> >identical(as.data.table(tt), ss)
> >
> >In each case above there is either a space, time, or code brevity advantage
> >with the data.table.
> >
> >The motivation for the new class grew from the realization that performance
> >of data.frames can be improved by removing the rownames.  See here for the
> >previous discussion
> >http://tolstoy.newcastle.edu.au/R/devel/05/12/3439.html.
> >
> >Regards,
> >Matthew
> >
> >______________________________________________
> >R-devel at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >_______________________________________________
> >R-sig-Geo mailing list
> >R-sig-Geo at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >  
> >
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list