[BioC] how to use rtracklayer to retrieve sequence of list of coordinates
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Mar 1 20:31:37 CET 2010
Hi Sabrina,
On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at gmail.com> wrote:
> Hi,all:
> I have a long list of coordinates, which was created by the following code:
> testIranges<- IRanges(start=currCoorList $starts,
> end=currCoorLis$ends)
>
> testTrack<-GenomicData(testIranges,chrom=currCoorList$chr,
> strand=currCoorList$strand,genome="mm9")
>
> I could then export the testTrack into .bed file and upload in the UCSC
> genome browser and get the sequences of the listed coordinate
> But I was thinking if there is a function from rtracklayer to retrieve
> sequence directly.
If you're trying to get coordinates from some reference genome, you
can use the BSgenome.*.* packages and skip rtracklayer + internet
queries entirely.
For instance, say I have an IRanges object `r1` which is a set of
intervals on chromosome 1 from hg18 that I want sequence information
for, I could do it like this:
R> r1
IRanges of length 589
start end width
[1] 18016124 18016155 32
[2] 18020749 18020780 32
[3] 18024599 18024630 32
[4] 18024830 18024861 32
[5] 18051985 18052016 32
[6] 18064760 18064791 32
[7] 18088145 18088176 32
[8] 18088200 18088231 32
[9] 18110085 18110116 32
...
R> library(BSgenome.Hsapiens.UCSC.hg18)
R> chr1 <- unmasked(HSapiens$chr1)
R> myseqs <- Views(chr1, start=start(r1), end=end(r1))
R> myseqs
Views on a 247249719-letter DNAString subject
subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
start end width
[1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA]
[2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG]
[3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC]
[4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT]
[5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC]
[6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG]
[7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA]
[8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC]
[9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT]
[10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT]
...
calling "as.character" on the myseqs object will give you the sequence
as a character vector.
Does that get you what you're after?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list