[R] List of lists? Data frames? (Or other data structures?)

Thu May 1 09:42:55 CEST 2003

On Wed, 30 Apr 2003, Roger Peng wrote:

> If you're talking about rows and columns, it seems like the appropriate
> data structure for you is the data frame.  I think your list of lists
> representation might get unwieldy after a while.  I can't really think of
> why a data frame would be any slower than a list of lists -- I've never
> experienced such behavior.
> 
> read.table() may be a little slower than scan() because read.table() reads
> in an entire file and then converts each of the columns into an
> appropriate data class.  So there is some post-processing going on.  It
> doesn't have anything to do with data frames vs. lists.

Only if you don't specify colClasses: if you do (and you would need the
information to use scan()) there should be no performance penalty. (Note
that matrices can be scan()-ed into a vector and the dimensions added, and
that will be faster.)

> 
> -roger
> _______________________________
> UCLA Department of Statistics
> http://www.stat.ucla.edu/~rpeng
> 
> On Thu, 1 May 2003, R A F wrote:
> 
> > Hi, I'm faced with the following problem and would appreciate some
> > advice.
> > 
> > I could have a data frame x that looks like this:
> >          aa          bb
> > a        1           "A"
> > b        2           "B"
> > 
> > The advantage of this is that I could access all the individual
> > components easily.  Also I could access all the rows and columns
> > easily.
> > 
> > Alternatively, I could have a list of lists that looks like this:
> > 
> > xprime <- list()
> > xprime$a <- list()
> > xprime$b <- list()
> > 
> > xprime$a$aa <- 1
> > xprime$a$bb <- "A"
> > 
> > xprime$b$aa <- 2
> > xprime$b$bb <- "B"
> > 
> > etc.
> > 
> > If speed is important, would a list of lists be faster than a data
> > frame? (I know, for example, that scan is supposed to be faster than
> > read.table, but I don't know if that is related to issues with data
> > frames.)
> > 
> > My problem with a list of lists, though, is that if I want to access
> > all the bb subcomponents, a naive method like this one failed:
> > 
> > y <- c( "a", "b" )
> > xprime[[ y ]]$bb (Does not work)

You are supposed to use [[ ]] to extract a single component. I don't think
you expected

> xprime[[ y ]]
[1] "A"

(as from 1.7.0).

> > So to get all the bb subcomponents I seem to need to loop, which may
> > slow things down (presumably).  But maybe people here know of a way.

Something is going to have to loop, so it probably is not slow to use an 
explicit loop.

> > Finally what would be the "best" way given the constraint of quick
> > access to all rows, columns and individual components?
> > 
> > I'd appreciate your thoughts and comments.  Thanks very much.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595