[R] List of lists? Data frames? (Or other data structures?)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu May 1 09:42:55 CEST 2003
On Wed, 30 Apr 2003, Roger Peng wrote:
> If you're talking about rows and columns, it seems like the appropriate
> data structure for you is the data frame. I think your list of lists
> representation might get unwieldy after a while. I can't really think of
> why a data frame would be any slower than a list of lists -- I've never
> experienced such behavior.
> read.table() may be a little slower than scan() because read.table() reads
> in an entire file and then converts each of the columns into an
> appropriate data class. So there is some post-processing going on. It
> doesn't have anything to do with data frames vs. lists.
Only if you don't specify colClasses: if you do (and you would need the
information to use scan()) there should be no performance penalty. (Note
that matrices can be scan()-ed into a vector and the dimensions added, and
that will be faster.)
> UCLA Department of Statistics
> On Thu, 1 May 2003, R A F wrote:
> > Hi, I'm faced with the following problem and would appreciate some
> > advice.
> > I could have a data frame x that looks like this:
> > aa bb
> > a 1 "A"
> > b 2 "B"
> > The advantage of this is that I could access all the individual
> > components easily. Also I could access all the rows and columns
> > easily.
> > Alternatively, I could have a list of lists that looks like this:
> > xprime <- list()
> > xprime$a <- list()
> > xprime$b <- list()
> > xprime$a$aa <- 1
> > xprime$a$bb <- "A"
> > xprime$b$aa <- 2
> > xprime$b$bb <- "B"
> > etc.
> > If speed is important, would a list of lists be faster than a data
> > frame? (I know, for example, that scan is supposed to be faster than
> > read.table, but I don't know if that is related to issues with data
> > frames.)
> > My problem with a list of lists, though, is that if I want to access
> > all the bb subcomponents, a naive method like this one failed:
> > y <- c( "a", "b" )
> > xprime[[ y ]]$bb (Does not work)
You are supposed to use [[ ]] to extract a single component. I don't think
> xprime[[ y ]]
(as from 1.7.0).
> > So to get all the bb subcomponents I seem to need to loop, which may
> > slow things down (presumably). But maybe people here know of a way.
Something is going to have to loop, so it probably is not slow to use an
> > Finally what would be the "best" way given the constraint of quick
> > access to all rows, columns and individual components?
> > I'd appreciate your thoughts and comments. Thanks very much.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help