[Bioc-sig-seq] AlignedRead and complex subsetting
Ivan Gregoretti
ivangreg at gmail.com
Wed Sep 2 18:47:43 CEST 2009
Hello Steve, Martin and Everybody,
First, thank you both for your help.
As suggested by Steve, I look at
https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html
by it didn't quite applied to my case. I need to iterate over
chromosomes and I need to keep strand information.
I'll try this new algorithm Martin just wrote. By the way, Martin,
what kind of object should rangesList be?
Say that I load my regions like this
myRegions <- import('myregions.bed')
Can I coerce that IRanges instance myRegions to the rangestList class? How?
Thank you!
Ivan
Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878
On Tue, Sep 1, 2009 at 7:41 PM, Martin Morgan<mtmorgan at fhcrc.org> wrote:
> Hi Ivan --
>
> Steve Lianoglou wrote:
>> Hi Ivan,
>>
>> On Aug 31, 2009, at 4:54 PM, Ivan Gregoretti wrote:
>>
>>> Hello Everybody,
>>>
>>> How do you subset an AlignedRead instance to keep (or reject) tags
>>> that lay within a set of genomic regions?
>>>
>>>
>>> Example
>>>
>>> Lets say that I have an AlignedRead instance called aln.
>>>
>>> Now let's say that I have a set of positions in BED style:
>>>
>>> (chromosome, start end)
>>> ch1 1000000 1000050
>>> chrX 20000000 20100000
>>> ...(many more)...
>>>
>>> We can imagine that I have the BED set loaded as a data frame.
>>>
>>> Is it possible to pick from aln only the tags within (or outside) the
>>> features defined in the table described above?
>>
>> I think that you should convert your BED file to an IRanges object, and
>> use overlap with your ranges + your readAligned object to get what your
>
> I wrote this, as a trial implementation of %in%. Is that useful? (also
> !(... %in% ...) though that would, e.g., include NAs.
>
>
> setMethod("%in%", c("AlignedRead", "RangesList"),
> function(x, table)
> {
> ## consider only sensible alignemnts
> chr <- chromosome(x)
> pos <- position(x)
> wd <- width(x)
> notNA <- !(is.na(chr) | is.na(pos) | is.na(wd))
> ## find overlap
> chr <- chr[notNA]
> rl <- RangesList(mapply(IRanges, start=split(pos[notNA], chr),
> width=split(wd[notNA], chr)))
> olap <- rl %in% table
> ## map to original indicies
> len <- seq_len(length(x))
> idx <- unlist(split(len[notNA], chr), use.names=FALSE)
> len %in% idx[unlist(olap)]
> })
>
> One would then
>
> aln[aln %in% rangesList]
>
> Martin
>
>> after. See Martin's post about something like this in this thread:
>>
>> https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-August/000509.html
>>
>> To get the reads *outside* of your ranges, maybe you can call the
>> ``gaps`` on your bed/ranges and then do the same thing ... or perhaps
>> ``setdiff(ranges, aln)`` might work, too? (where aln is your IRanges
>> converted alignedRead object (if necessary)).
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
More information about the Bioc-sig-sequencing
mailing list