[Rd] Feature request: extend functionality of 'unlist()' by args 'delim=c("/", "_", etc.)' and 'keep.special=TRUE/FALSE'

Duncan Murdoch murdoch.duncan at gmail.com
Thu May 19 14:58:00 CEST 2011


On 19/05/2011 8:15 AM, Janko Thyson wrote:
> Dear list,
>
> I hope this is the right place to post a feature request. If there's
> exists a more formal channel (e.g. as for bug reports), I'd appreciate a
> pointer.

This is a good place to post.

>
> I work a lot with named nested lists with arbitrary degrees of
> "nestedness". In order to retrieve the names and/or values of "bottom
> layer/bottom tier", I love the functionality of 'unlist()', or
> 'names(unlist(x))', respectively as it avoids traversing the nested
> lists via recursive loop constructs. I'm also aware that the general
> suggestion is probably to keep nestedness as low as possible when
> working with lists, but arbitrary deeply nested lists came in quite
> handy for me as long as each element is named and as long as I can
> quickly add and retrieve element values via "name paths".
>
> Here's a little example list:
> lst<- list(a=list(a.1=list(a.1.1=NA, a.1.2=5), a.2=list()), b=NULL)
>
> It would be awesome if 'unlist(x)' could be extended with the following
> functionality:
>
> 1) An argument such as 'delim' that controls how the respective layer
> names are pasted.
> Right now, they are always separated by a dot:
>   >  names(unlist(lst))
> [1] "a.a.1.a.1.1" "a.a.1.a.1.2"
> Desired:
>   >  names(unlist(lst, delim="/"))
> [1] "a/a.1/a.1.1" "a/a.1/a.1.2"
>   >  names(unlist(lst, delim="_"))
> [1] "a_a.1_a.1.1" "a_a.1_a.1.2"
>
> 2)  An argument that allows to include either elements of zero length or
> of value NULL to be *included* in the resulting output.
> Right now, they are dropped (which makes perfect sense as NULL values
> and zero length values are dropped in vectors):
>   >  c(1,2, NULL, numeric())
> [1] 1 2
>   >  unlist(lst)
> a.a.1.a.1.1 a.a.1.a.1.2
>            NA           5
> Desired:
>   >  unlist(lst, delim="/", keep.special=TRUE)
> $a/a.1/a.1.1
> [1] NA
>
> $a/a.1/a.1.2
> [1] 5
>
> $a/a.2
> list()
>
> $b
> NULL
> Of course, this would not be a true 'unlist' anymore, but something like
> 'retrieveBottomLayer()'.
>
> Thanks a lot for providing such fast stuff as 'unlist()'! Unfortunately,
> I don't know my way around internal C routines and therefore I would
> greatly appreciate if core team developers would consider my two
> suggestions.

The suggestions seem reasonable, but are difficult to implement.  The 
problem is that unlist() is a generic function, but there's no 
unlist.default() in R:  the default and method dispatch are implemented 
at the C level.  Normally adding arguments to the default method doesn't 
cause problems elsewhere, because methods only need to be compatible 
with the generic.  But since there's no way to modify the argument list 
of the default method in this case, the generic function would need to 
be modified, and that means every unlist method would need to be 
modified too.

So I wouldn't want to take this on.

In case someone else does, I'd suggest a different change than the 
"keep.special" argument.  I think a "coerce=TRUE" argument would be 
better:  If TRUE, you get the current behaviour, which coerces 
components according to the hierarchy listed on the help page.  If 
FALSE, then no coercion is done, and unlist() just flattens the list 
into a new one, e.g.

unlist( list(1, 2, NULL, list("A", "B")), coerce=FALSE)

would return list(1, 2, NULL, "A", "B") instead of c("1", "2", "A", "B").

One workaround I thought of was to add an element to the list that 
couldn't be coerced, but this doesn't work.  For example:

e <- environment() # can't be coerced
x <- list(1, 2, NULL, list("A", "B"), e)
unlist(x)

# Returns list(1,2,"A","B",e)

I think it would be reasonable for this version to retain the NULL, 
since it is not doing any coercion.

Duncan Murdoch



More information about the R-devel mailing list