[Bioc-devel] Subsetting Lists by Lists
Hervé Pagès
hpages at fhcrc.org
Fri Apr 4 08:36:06 CEST 2014
Added in IRanges 1.21.41.
H.
On 04/01/2014 06:15 PM, Michael Lawrence wrote:
> I like phead/ptail. I was going to write them, so thanks for taking care
> of it!
>
> Michael
>
>
>
> On Tue, Apr 1, 2014 at 3:24 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> On 04/01/2014 02:43 PM, Michael Lawrence wrote:
>
> Thanks Herve. I might not be so bad to have rep out in the
> unnamed case
> (think of NULL names meaning wildcard). If we had:
>
> i <- IntegerList(1:5)
> x[i]
>
> The 'i' does not really identify any one element in 'x'. If both
> 'i' and
> 'x' had names, then there would be a matching, but otherwise,
> truncating
> 'x' to length(i) is surprising, and it's hard to imagine a
> use-case for
> it. In some ways, this is analogous to logical indexing, which
> is recycled.
>
> But that said, my use case is really more of a pluckHead/Tail. Don't
> worry about this release.
>
>
> The pseq_len() utility I sent previously solves your pluckHead()
> problem:
>
> pluckHead <- function(x, n=6)
> {
> x[pseq_len(pmin(__elementLengths(x), n))]
> }
>
> or, using the non-exported utility IRanges:::fancy_mseq():
>
> pluckHead <- function(x, n=6)
> {
> x_eltlens <- unname(elementLengths(x))
> i_eltlens <- pmin(x_eltlens, n)
> i_skeleton <- PartitioningByEnd(cumsum(i___eltlens),
> names=names(x))
> unlisted_i <- IRanges:::fancy_mseq(i___eltlens)
> i <- relist(unlisted_i, i_skeleton)
> x[i]
> }
>
> For pluckTail():
>
> pluckTail <- function(x, n=6)
> {
> x_eltlens <- unname(elementLengths(x))
> i_eltlens <- pmin(x_eltlens, n)
> i_skeleton <- PartitioningByEnd(cumsum(i___eltlens),
> names=names(x))
> offset <- x_eltlens - i_eltlens
> unlisted_i <- IRanges:::fancy_mseq(i___eltlens, offset)
> i <- relist(unlisted_i, i_skeleton)
> x[i]
> }
>
> For both, 'n' can be of length > 1 and is recycled to the length of 'x'.
> Negative values in 'n' are not supported but that should be easy to
> add.
>
> So I could add these 2 functions to IRanges, however, I'm not totally
> convinced by the names. What about phead() and ptail() ("p" for
> "parallel"), or vhead() and vtail() ("v" for "vectorized"), or mhead()
> and mtail() (they're just fast equivalent to 'mapply(head, x, n)' and
> 'mapply(tail, x, n))', or...?
>
> Thanks,
> H.
>
>
> Michael
>
>
>
> On Tue, Apr 1, 2014 at 12:06 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
> On 04/01/2014 10:17 AM, Ryan wrote:
>
> That won't work if any vector has fewer than 5
> elements. Maybe
>
> lapply(x, head, n=5)
>
> would work?
>
>
> Yes. Note that you can use endoapply() to preserve the
> class of the
> original object:
>
> > endoapply(cvg, head, n=5)
>
> RleList of length 3
> $chr1
> integer-Rle of length 5 with 2 runs
> Lengths: 4 1
> Values : 1 2
>
>
> $chr2
> integer-Rle of length 5 with 4 runs
> Lengths: 1 1 1 2
> Values : 0 1 2 3
>
> $chr3
> integer-Rle of length 5 with 1 run
> Lengths: 5
> Values : 0
>
> But lapply- or endoapply-based solutions are slower than a
> [ based
> solution. Unfortunately the latter requires too much
> munging to get
> the subscript right:
>
> ## parallel seq_len()
> pseq_len <- function(eltlens)
> {
> ans_skeleton <- PartitioningByWidth(eltlens)
> tmp <- relist(seq_len(sum(eltlens)), ans_skeleton)
> tmp - start(ans_skeleton) + 1L
> }
>
> Then:
>
> > pseq_len(c(5, 1, 0, 2))
> IntegerList of length 4
> [[1]] 1 2 3 4 5
> [[2]] 1
> [[3]] integer(0)
> [[4]] 1 2
>
> > cvg[pseq_len(pmin(____elementLengths(cvg), 5))]
>
>
> RleList of length 3
> $chr1
> integer-Rle of length 5 with 2 runs
> Lengths: 4 1
> Values : 1 2
>
>
> $chr2
> integer-Rle of length 5 with 4 runs
> Lengths: 1 1 1 2
> Values : 0 1 2 3
>
> $chr3
> integer-Rle of length 5 with 1 run
> Lengths: 5
> Values : 0
>
> H.
>
>
>
> On Tue Apr 1 09:24:51 2014, Cook, Malcolm wrote:
>
> in the mean time,
>
> lapply(`[`,x,IntegerList(1:5))
>
> ??
>
> >-----Original Message-----
> >From: bioc-devel-bounces at r-project.____org
> <mailto:bioc-devel-bounces at r-__project.org
> <mailto:bioc-devel-bounces at r-project.org>>
> [mailto:bioc-devel-bounces at r-____project.org
> <mailto:bioc-devel-bounces at r-__project.org>
> <mailto:bioc-devel-bounces at r-__project.org
> <mailto:bioc-devel-bounces at r-project.org>>] On Behalf Of
> Michael Lawrence
> >Sent: Tuesday, April 01, 2014 9:21 AM
> >To: bioc-devel at r-project.org
> <mailto:bioc-devel at r-project.org>
> <mailto:bioc-devel at r-project.__org
> <mailto:bioc-devel at r-project.org>>
> >Subject: [Bioc-devel] Subsetting Lists by Lists
> >
> >Mostly to Herve:
> >
> >Sometimes we want to pluck the first 1, or 10, or
> whatever elements
> from
> >each element of a list. If I had a list 'x', I
> thought I
> could do
> this with:
> >
> >x[IntegerList(1:5)]
> >
> >But it only gives elements 1:5 from x[[1]], not
> each
> element of
> 'x'. In
> >other words, I thought the index would be
> repped out.
> Instead, 'x' is
> >subset to the length of 'i', and I'm not sure
> if that
> makes sense?
> >
> >But maybe what we really want are
> pluckHead/Tail, which
> would be
> robust to
> >the case that < n elements are in an element.
> And of
> course a more
> general
> >pluck(x, i) to select 'i' from each element, but I
> wanted the line
> above to
> >do that.
> >
> >Michael
> >
> > [[alternative HTML version deleted]]
> >
> >___________________________________________________
> >Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>> mailing list
>
> >https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
> ___________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>
> mailing list
> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
> ___________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.__org
> <mailto:Bioc-devel at r-project.org>>
> mailing list
> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list