[Bioc-devel] Subsetting Lists by Lists

Hervé Pagès hpages at fhcrc.org
Wed Apr 2 00:24:28 CEST 2014


On 04/01/2014 02:43 PM, Michael Lawrence wrote:
> Thanks Herve. I might not be so bad to have rep out in the unnamed case
> (think of NULL names meaning wildcard). If we had:
>
> i <- IntegerList(1:5)
> x[i]
>
> The 'i' does not really identify any one element in 'x'. If both 'i' and
> 'x' had names, then there would be a matching, but otherwise, truncating
> 'x' to length(i) is surprising, and it's hard to imagine a use-case for
> it. In some ways, this is analogous to logical indexing, which is recycled.
>
> But that said, my use case is really more of a pluckHead/Tail. Don't
> worry about this release.

The pseq_len() utility I sent previously solves your pluckHead()
problem:

   pluckHead <- function(x, n=6)
   {
     x[pseq_len(pmin(elementLengths(x), n))]
   }

or, using the non-exported utility IRanges:::fancy_mseq():

   pluckHead <- function(x, n=6)
   {
     x_eltlens <- unname(elementLengths(x))
     i_eltlens <- pmin(x_eltlens, n)
     i_skeleton <- PartitioningByEnd(cumsum(i_eltlens), names=names(x))
     unlisted_i <- IRanges:::fancy_mseq(i_eltlens)
     i <- relist(unlisted_i, i_skeleton)
     x[i]
   }

For pluckTail():

   pluckTail <- function(x, n=6)
   {
     x_eltlens <- unname(elementLengths(x))
     i_eltlens <- pmin(x_eltlens, n)
     i_skeleton <- PartitioningByEnd(cumsum(i_eltlens), names=names(x))
     offset <- x_eltlens - i_eltlens
     unlisted_i <- IRanges:::fancy_mseq(i_eltlens, offset)
     i <- relist(unlisted_i, i_skeleton)
     x[i]
   }

For both, 'n' can be of length > 1 and is recycled to the length of 'x'.
Negative values in 'n' are not supported but that should be easy to
add.

So I could add these 2 functions to IRanges, however, I'm not totally
convinced by the names. What about phead() and ptail() ("p" for
"parallel"), or vhead() and vtail() ("v" for "vectorized"), or mhead()
and mtail() (they're just fast equivalent to 'mapply(head, x, n)' and
'mapply(tail, x, n))', or...?

Thanks,
H.

>
> Michael
>
>
>
> On Tue, Apr 1, 2014 at 12:06 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     On 04/01/2014 10:17 AM, Ryan wrote:
>
>         That won't work if any vector has fewer than 5 elements. Maybe
>
>         lapply(x, head, n=5)
>
>         would work?
>
>
>     Yes. Note that you can use endoapply() to preserve the class of the
>     original object:
>
>        > endoapply(cvg, head, n=5)
>
>        RleList of length 3
>        $chr1
>        integer-Rle of length 5 with 2 runs
>          Lengths: 4 1
>          Values : 1 2
>
>
>        $chr2
>        integer-Rle of length 5 with 4 runs
>          Lengths: 1 1 1 2
>          Values : 0 1 2 3
>
>        $chr3
>        integer-Rle of length 5 with 1 run
>          Lengths: 5
>          Values : 0
>
>     But lapply- or endoapply-based solutions are slower than a [ based
>     solution. Unfortunately the latter requires too much munging to get
>     the subscript right:
>
>        ## parallel seq_len()
>        pseq_len <- function(eltlens)
>        {
>          ans_skeleton <- PartitioningByWidth(eltlens)
>          tmp <- relist(seq_len(sum(eltlens)), ans_skeleton)
>          tmp - start(ans_skeleton) + 1L
>        }
>
>     Then:
>
>        > pseq_len(c(5, 1, 0, 2))
>        IntegerList of length 4
>        [[1]] 1 2 3 4 5
>        [[2]] 1
>        [[3]] integer(0)
>        [[4]] 1 2
>
>        > cvg[pseq_len(pmin(__elementLengths(cvg), 5))]
>
>        RleList of length 3
>        $chr1
>        integer-Rle of length 5 with 2 runs
>          Lengths: 4 1
>          Values : 1 2
>
>
>        $chr2
>        integer-Rle of length 5 with 4 runs
>          Lengths: 1 1 1 2
>          Values : 0 1 2 3
>
>        $chr3
>        integer-Rle of length 5 with 1 run
>          Lengths: 5
>          Values : 0
>
>     H.
>
>
>
>         On Tue Apr  1 09:24:51 2014, Cook, Malcolm wrote:
>
>             in the mean time,
>
>             lapply(`[`,x,IntegerList(1:5))
>
>             ??
>
>                >-----Original Message-----
>                >From: bioc-devel-bounces at r-project.__org
>             <mailto:bioc-devel-bounces at r-project.org>
>             [mailto:bioc-devel-bounces at r-__project.org
>             <mailto:bioc-devel-bounces at r-project.org>] On Behalf Of
>             Michael Lawrence
>                >Sent: Tuesday, April 01, 2014 9:21 AM
>                >To: bioc-devel at r-project.org
>             <mailto:bioc-devel at r-project.org>
>                >Subject: [Bioc-devel] Subsetting Lists by Lists
>                >
>                >Mostly to Herve:
>                >
>                >Sometimes we want to pluck the first 1, or 10, or
>             whatever elements
>             from
>                >each element of a list. If I had a list 'x', I thought I
>             could do
>             this with:
>                >
>                >x[IntegerList(1:5)]
>                >
>                >But it only gives elements 1:5 from x[[1]], not each
>             element of
>             'x'. In
>                >other words, I thought the index would be repped out.
>             Instead, 'x' is
>                >subset to the length of 'i', and I'm not sure if that
>             makes sense?
>                >
>                >But maybe what we really want are pluckHead/Tail, which
>             would be
>             robust to
>                >the case that < n elements are in an element. And of
>             course a more
>             general
>                >pluck(x, i) to select 'i' from each element, but I
>             wanted the line
>             above to
>                >do that.
>                >
>                >Michael
>                >
>                >    [[alternative HTML version deleted]]
>                >
>                >_________________________________________________
>                >Bioc-devel at r-project.org
>             <mailto:Bioc-devel at r-project.org> mailing list
>                >https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>             _________________________________________________
>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>             mailing list
>             https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>             <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>         _________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list