[Bioc-sig-seq] AlignedRead and complex subsetting

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 2 01:41:29 CEST 2009


Hi Ivan --

Steve Lianoglou wrote:
> Hi Ivan,
> 
> On Aug 31, 2009, at 4:54 PM, Ivan Gregoretti wrote:
> 
>> Hello Everybody,
>>
>> How do you subset an AlignedRead instance to keep (or reject) tags
>> that lay within a set of genomic regions?
>>
>>
>> Example
>>
>> Lets say that I have an AlignedRead instance called aln.
>>
>> Now let's say that I have a set of positions in BED style:
>>
>> (chromosome, start end)
>> ch1 1000000 1000050
>> chrX 20000000 20100000
>> ...(many more)...
>>
>> We can imagine that I have the BED set loaded as a data frame.
>>
>> Is it possible to pick from aln only the tags within (or outside) the
>> features defined in the table described above?
> 
> I think that you should convert your BED file to an IRanges object, and
> use overlap with your ranges + your readAligned object to get what your

I wrote this, as a trial implementation of %in%. Is that useful? (also
!(... %in% ...) though that would, e.g., include NAs.


setMethod("%in%", c("AlignedRead", "RangesList"),
          function(x, table)
{
    ## consider only sensible alignemnts
    chr <- chromosome(x)
    pos <- position(x)
    wd <- width(x)
    notNA <- !(is.na(chr) | is.na(pos) | is.na(wd))
    ## find overlap
    chr <- chr[notNA]
    rl <- RangesList(mapply(IRanges, start=split(pos[notNA], chr),
                            width=split(wd[notNA], chr)))
    olap <- rl %in% table
    ## map to original indicies
    len <- seq_len(length(x))
    idx <- unlist(split(len[notNA], chr), use.names=FALSE)
    len %in% idx[unlist(olap)]
})

One would then

  aln[aln %in% rangesList]

Martin

> after. See Martin's post about something like this in this thread:
> 
> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html
> 
> To get the reads *outside* of your ranges, maybe you can call the
> ``gaps`` on your bed/ranges and then do the same thing ...  or perhaps 
> ``setdiff(ranges, aln)`` might work, too? (where aln is your IRanges
> converted alignedRead object (if necessary)).
> 
> -steve
> 
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>   |  Memorial Sloan-Kettering Cancer Center
>   |  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list