[BioC] extract introns

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Nov 11 05:33:03 CET 2011


Hi Martin,

On Thu, Nov 10, 2011 at 10:37 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 11/09/2011 03:22 AM, Yating Cheng wrote:
>>
>> Dear Bioconductor Memebers,
>>
>> Now I have to extract intron sequences, I have already exon+intron, intron
>> sequences. Someone told me that I can use Biostring. But I tried, it
>> failed.
>>
>> Do you know how to use Biostring to solve this problem, or is there any
>> other possibility to solve this problem?
>
> Steve and Ivan mention GRanges + BSgenome. I wanted to mention
> Rsamtools::FaFile with indexFa, which allows input of DNA sequence from
> fasta files rather than BSgenome packages. You gain flexibility in terms of
> where the sequence comes from, but lose BSgenome features like masking and
> data management.

That's super cool! Thanks for adding that

This means that we have to keep those *.fa files uncompressed tho,
right? I wonder if it's worth thinking about (for R 2.15 ;-) handling
compressed *.fa files using BGZF along these lines, too:

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list