[BioC] how to use rtracklayer to retrieve sequence of list of	coordinates
    Steve Lianoglou 
    mailinglist.honeypot at gmail.com
       
    Mon Mar  1 20:31:37 CET 2010
    
    
  
Hi Sabrina,
On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at gmail.com> wrote:
> Hi,all:
> I have a long list of coordinates, which was created by the following code:
> testIranges<- IRanges(start=currCoorList $starts,
>        end=currCoorLis$ends)
>
>   testTrack<-GenomicData(testIranges,chrom=currCoorList$chr,
>        strand=currCoorList$strand,genome="mm9")
>
> I could then export the testTrack into .bed file and upload in the UCSC
> genome browser and get the sequences of the listed coordinate
> But I was thinking if there is a function from rtracklayer to retrieve
> sequence directly.
If you're trying to get coordinates from some reference genome, you
can use the BSgenome.*.* packages and skip rtracklayer + internet
queries entirely.
For instance, say I have an IRanges object `r1` which is a set of
intervals on chromosome 1 from hg18 that I want sequence information
for, I could do it like this:
R> r1
IRanges of length 589
        start      end width
[1]   18016124 18016155    32
[2]   18020749 18020780    32
[3]   18024599 18024630    32
[4]   18024830 18024861    32
[5]   18051985 18052016    32
[6]   18064760 18064791    32
[7]   18088145 18088176    32
[8]   18088200 18088231    32
[9]   18110085 18110116    32
...
R> library(BSgenome.Hsapiens.UCSC.hg18)
R> chr1 <- unmasked(HSapiens$chr1)
R> myseqs <- Views(chr1, start=start(r1), end=end(r1))
R> myseqs
 Views on a 247249719-letter DNAString subject
subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
       start      end width
[1] 18016124 18016155    32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA]
[2] 18020749 18020780    32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG]
[3] 18024599 18024630    32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC]
[4] 18024830 18024861    32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT]
[5] 18051985 18052016    32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC]
[6] 18064760 18064791    32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG]
[7] 18088145 18088176    32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA]
[8] 18088200 18088231    32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC]
[9] 18110085 18110116    32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT]
[10] 18118454 18118485    32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT]
...
calling "as.character" on the myseqs object will give you the sequence
as a character vector.
Does that get you what you're after?
-steve
-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
    
    
More information about the Bioconductor
mailing list