[Rd] [R] How to create a list that grows automatically

Sun Mar 25 15:43:22 CEST 2007

On 3/25/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> [Moved to R-devel for a technical comment.]
>
> On Fri, 9 Mar 2007, hadley wickham wrote:
>
> >> I would like to know if there is a way to create a list or an array (or
> >> anything) which grows automatically as more elements are put into it. What I
> >> want to find is something equivalent to an ArrayList object of Java
> >> language. In Java, I can do the following thing:
> >>
> >> // Java code
> >> ArrayList myArray = new ArrayList();
> >> myArray.add("object1");
> >> myArray.add("object2");
> >> ....
> >> // End of java code
> >
> > As others have mentioned, you can do this with lists in R.
> >
> > However, there is an important difference between ArrayLists in Java
> > and Lists in R.  In Java, when an ArrayList grows past its bound, it
> > doesn't allocate just enough space, it allocates a lot more, so the
> > next time you allocate past the end of the array, there's space
> > already reserved.  This gives (IIRC) amortised O(n) behaviour.  R
> > doesn't do this however, so has to copy the entire array every time
> > giving O(n^2) behaviour.
>
> In fact this is an implementation detail.  R has both 'length' and
> 'truelength' fields in its headers for vectors (including lists) and could
> grow the allocation in the same way as you report Java does.  When I asked
> Ross what the intention had been (the 'truelength' field is almost unused)
> he mentioned this potential usage.  Given that these structures are opaque
> to all but R internal code it should not be hard to change R's scheme to
> over-allocate: to decide how much to do would be harder (but say rounding
> vectors in the large allocation class up to a VM page would get a
> noticeable benefit in some usages with a negligible impact on memory
> footprint).  Backwards compatibility of save() format would be an issue.

Interesting - it would be fascinating (but difficult!) to find out how
much such a change would affect the average running time of all R
code.

> It seems the really inefficient uses are of the type
>
> x <- NULL
> for(i in 1:10000) x <- c(x, fn(i))
>
> and those would be unaltered.

Does R do any manipulations of the AST after the code has been parsed?
 If it did, wouldn't it be fairly easy to recognise this idiom and
optimise it to x[length(x) + 1] <- fn(i) ?  It has something of the
flavour of tail recursion optimisation.  Maybe this is something that
Luke is working on in his byte-code compiler.

Hadley