[R] pairlists (was: data manipulation function descriptions)
Luke Tierney
luke at stat.uiowa.edu
Fri Feb 14 23:25:04 CET 2003
On 14 Feb 2003, Peter Dalgaard BSA wrote:
> "Warnes, Gregory R" <gregory_r_warnes at groton.pfizer.com> writes:
>
> > > -----Original Message-----
> > > From: Luke Tierney [mailto:luke at stat.uiowa.edu]
> >
> > > R does not provide a pairlist data structure. This creates a dilemma
> > > when translating some list-based xlispstat code, or, more
> > > importantly, when implementing an algorithm for which parilists are
> > > the natural data structure to use.
> > > ...
> > > Pairlists were and still are used internally for many things.
> > > ...
> >
> > Wouldn't it, therefore, make sense to provide a 'pairlist' package which
> > exposes the internal pairlist structure and provides appropriate functions
> > (car, cdr, ...), instead of expecting people to keep re-implementing these
> > features?
>
> Some ancient consideration pops up here. We do actually expose
> pairlists in a few places (try mode(.Options)). Some people consider
> that this is a remnant and should be stamped out, but we might also
> consider doing what you suggest.
>
> The big problem with old R was not so much the pairlists but that they
> were used for representing objects of mode "list" so to get to X[[n]]
> you had to count through the list from the beginning which killed
> performance in some important cases. Then again, adding elements to a
> generic vector requires copying the whole thing. Of course all the
> legacy S code tended to do the former and not the latter, so generic
> vectors ended up winning.
>
> One or two reservations: With full lisp style access, could we end up
> with (circular) data structures that confuse the garbage collector?
> And might we -- supposing we allowed destructive list modifications --
> end up with strange semantics a la the .Alias mess we had for a while?
> Of course Luke would be the first to know about this.
>
> Then of course there is the question of reverse compatibility. I don't
> consider it much of a loss if R code doesn't run in Splus, but others
> might.
>
>
The garbage collector isn't a problem--it can handle circularities
fine. It's all the rest of the internals that would get confused (in
most cases blowing out the C stack). But you can only get
circularities if you allow destructive modification, which you don't
need for purely functional uses and which we did not allow in the old
days of parlists for generic vectors.
But I really don't think there is much benefit in making the internal
pairlist objects available (and potentially significant cost because
I'm not sure what would break). Writing a list package might be quite
a useful contribution, but it can be done with a pure R representation
using, say, S4 classes. Looking at the list functions provided in CL
and in the Standard ML library might be a good way to start for anyone
who wants to give it a go. There are also some interesting choices of
semantics. Here are two different implementations
Cons <- function(x, y) list(x,y)
First <- function(x) x[[1]]
Rest <- function(x) x[[2]]
Cons <- function(x, y) list(function() x, function() y)
First <- function(x) x[[1]]()
Rest <- function(x) x[[2]]()
Performance is different (not necessarily the direction you might
expect) and semantics are different (think about what
intsFrom <- function(n) Cons(n, intsFrom(n + 1))
does)
luke
--
Luke Tierney
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 14 Feb 2003, Peter Dalgaard BSA wrote:
> "Warnes, Gregory R" <gregory_r_warnes at groton.pfizer.com> writes:
>
> > > -----Original Message-----
> > > From: Luke Tierney [mailto:luke at stat.uiowa.edu]
> >
> > > R does not provide a pairlist data structure. This creates a dilemma
> > > when translating some list-based xlispstat code, or, more
> > > importantly, when implementing an algorithm for which parilists are
> > > the natural data structure to use.
> > > ...
> > > Pairlists were and still are used internally for many things.
> > > ...
> >
> > Wouldn't it, therefore, make sense to provide a 'pairlist' package which
> > exposes the internal pairlist structure and provides appropriate functions
> > (car, cdr, ...), instead of expecting people to keep re-implementing these
> > features?
>
> Some ancient consideration pops up here. We do actually expose
> pairlists in a few places (try mode(.Options)). Some people consider
> that this is a remnant and should be stamped out, but we might also
> consider doing what you suggest.
>
> The big problem with old R was not so much the pairlists but that they
> were used for representing objects of mode "list" so to get to X[[n]]
> you had to count through the list from the beginning which killed
> performance in some important cases. Then again, adding elements to a
> generic vector requires copying the whole thing. Of course all the
> legacy S code tended to do the former and not the latter, so generic
> vectors ended up winning.
>
> One or two reservations: With full lisp style access, could we end up
> with (circular) data structures that confuse the garbage collector?
> And might we -- supposing we allowed destructive list modifications --
> end up with strange semantics a la the .Alias mess we had for a while?
> Of course Luke would be the first to know about this.
>
> Then of course there is the question of reverse compatibility. I don't
> consider it much of a loss if R code doesn't run in Splus, but others
> might.
>
>
--
Luke Tierney
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-help
mailing list