[Bioc-devel] Subsetting Lists by Lists

Hervé Pagès hpages at fhcrc.org
Fri Apr 4 08:36:06 CEST 2014


Added in IRanges 1.21.41.

H.

On 04/01/2014 06:15 PM, Michael Lawrence wrote:
> I like phead/ptail. I was going to write them, so thanks for taking care
> of it!
>
> Michael
>
>
>
> On Tue, Apr 1, 2014 at 3:24 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     On 04/01/2014 02:43 PM, Michael Lawrence wrote:
>
>         Thanks Herve. I might not be so bad to have rep out in the
>         unnamed case
>         (think of NULL names meaning wildcard). If we had:
>
>         i <- IntegerList(1:5)
>         x[i]
>
>         The 'i' does not really identify any one element in 'x'. If both
>         'i' and
>         'x' had names, then there would be a matching, but otherwise,
>         truncating
>         'x' to length(i) is surprising, and it's hard to imagine a
>         use-case for
>         it. In some ways, this is analogous to logical indexing, which
>         is recycled.
>
>         But that said, my use case is really more of a pluckHead/Tail. Don't
>         worry about this release.
>
>
>     The pseq_len() utility I sent previously solves your pluckHead()
>     problem:
>
>        pluckHead <- function(x, n=6)
>        {
>          x[pseq_len(pmin(__elementLengths(x), n))]
>        }
>
>     or, using the non-exported utility IRanges:::fancy_mseq():
>
>        pluckHead <- function(x, n=6)
>        {
>          x_eltlens <- unname(elementLengths(x))
>          i_eltlens <- pmin(x_eltlens, n)
>          i_skeleton <- PartitioningByEnd(cumsum(i___eltlens),
>     names=names(x))
>          unlisted_i <- IRanges:::fancy_mseq(i___eltlens)
>          i <- relist(unlisted_i, i_skeleton)
>          x[i]
>        }
>
>     For pluckTail():
>
>        pluckTail <- function(x, n=6)
>        {
>          x_eltlens <- unname(elementLengths(x))
>          i_eltlens <- pmin(x_eltlens, n)
>          i_skeleton <- PartitioningByEnd(cumsum(i___eltlens),
>     names=names(x))
>          offset <- x_eltlens - i_eltlens
>          unlisted_i <- IRanges:::fancy_mseq(i___eltlens, offset)
>          i <- relist(unlisted_i, i_skeleton)
>          x[i]
>        }
>
>     For both, 'n' can be of length > 1 and is recycled to the length of 'x'.
>     Negative values in 'n' are not supported but that should be easy to
>     add.
>
>     So I could add these 2 functions to IRanges, however, I'm not totally
>     convinced by the names. What about phead() and ptail() ("p" for
>     "parallel"), or vhead() and vtail() ("v" for "vectorized"), or mhead()
>     and mtail() (they're just fast equivalent to 'mapply(head, x, n)' and
>     'mapply(tail, x, n))', or...?
>
>     Thanks,
>     H.
>
>
>         Michael
>
>
>
>         On Tue, Apr 1, 2014 at 12:06 PM, Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>              On 04/01/2014 10:17 AM, Ryan wrote:
>
>                  That won't work if any vector has fewer than 5
>         elements. Maybe
>
>                  lapply(x, head, n=5)
>
>                  would work?
>
>
>              Yes. Note that you can use endoapply() to preserve the
>         class of the
>              original object:
>
>                 > endoapply(cvg, head, n=5)
>
>                 RleList of length 3
>                 $chr1
>                 integer-Rle of length 5 with 2 runs
>                   Lengths: 4 1
>                   Values : 1 2
>
>
>                 $chr2
>                 integer-Rle of length 5 with 4 runs
>                   Lengths: 1 1 1 2
>                   Values : 0 1 2 3
>
>                 $chr3
>                 integer-Rle of length 5 with 1 run
>                   Lengths: 5
>                   Values : 0
>
>              But lapply- or endoapply-based solutions are slower than a
>         [ based
>              solution. Unfortunately the latter requires too much
>         munging to get
>              the subscript right:
>
>                 ## parallel seq_len()
>                 pseq_len <- function(eltlens)
>                 {
>                   ans_skeleton <- PartitioningByWidth(eltlens)
>                   tmp <- relist(seq_len(sum(eltlens)), ans_skeleton)
>                   tmp - start(ans_skeleton) + 1L
>                 }
>
>              Then:
>
>                 > pseq_len(c(5, 1, 0, 2))
>                 IntegerList of length 4
>                 [[1]] 1 2 3 4 5
>                 [[2]] 1
>                 [[3]] integer(0)
>                 [[4]] 1 2
>
>                 > cvg[pseq_len(pmin(____elementLengths(cvg), 5))]
>
>
>                 RleList of length 3
>                 $chr1
>                 integer-Rle of length 5 with 2 runs
>                   Lengths: 4 1
>                   Values : 1 2
>
>
>                 $chr2
>                 integer-Rle of length 5 with 4 runs
>                   Lengths: 1 1 1 2
>                   Values : 0 1 2 3
>
>                 $chr3
>                 integer-Rle of length 5 with 1 run
>                   Lengths: 5
>                   Values : 0
>
>              H.
>
>
>
>                  On Tue Apr  1 09:24:51 2014, Cook, Malcolm wrote:
>
>                      in the mean time,
>
>                      lapply(`[`,x,IntegerList(1:5))
>
>                      ??
>
>                         >-----Original Message-----
>                         >From: bioc-devel-bounces at r-project.____org
>                      <mailto:bioc-devel-bounces at r-__project.org
>         <mailto:bioc-devel-bounces at r-project.org>>
>                      [mailto:bioc-devel-bounces at r-____project.org
>         <mailto:bioc-devel-bounces at r-__project.org>
>                      <mailto:bioc-devel-bounces at r-__project.org
>         <mailto:bioc-devel-bounces at r-project.org>>] On Behalf Of
>                      Michael Lawrence
>                         >Sent: Tuesday, April 01, 2014 9:21 AM
>                         >To: bioc-devel at r-project.org
>         <mailto:bioc-devel at r-project.org>
>                      <mailto:bioc-devel at r-project.__org
>         <mailto:bioc-devel at r-project.org>>
>                         >Subject: [Bioc-devel] Subsetting Lists by Lists
>                         >
>                         >Mostly to Herve:
>                         >
>                         >Sometimes we want to pluck the first 1, or 10, or
>                      whatever elements
>                      from
>                         >each element of a list. If I had a list 'x', I
>         thought I
>                      could do
>                      this with:
>                         >
>                         >x[IntegerList(1:5)]
>                         >
>                         >But it only gives elements 1:5 from x[[1]], not
>         each
>                      element of
>                      'x'. In
>                         >other words, I thought the index would be
>         repped out.
>                      Instead, 'x' is
>                         >subset to the length of 'i', and I'm not sure
>         if that
>                      makes sense?
>                         >
>                         >But maybe what we really want are
>         pluckHead/Tail, which
>                      would be
>                      robust to
>                         >the case that < n elements are in an element.
>         And of
>                      course a more
>                      general
>                         >pluck(x, i) to select 'i' from each element, but I
>                      wanted the line
>                      above to
>                         >do that.
>                         >
>                         >Michael
>                         >
>                         >    [[alternative HTML version deleted]]
>                         >
>                         >___________________________________________________
>                         >Bioc-devel at r-project.org
>         <mailto:Bioc-devel at r-project.org>
>                      <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>> mailing list
>
>           >https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                      <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>                      ___________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                      mailing list
>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                      <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>                  ___________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                  mailing list
>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>              --
>              Hervé Pagès
>
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>              P.O. Box 19024
>              Seattle, WA 98109-1024
>
>              E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list