[BioC] how to use rtracklayer to retrieve sequence of list of coordinates
Hervé Pagès
hpages at fhcrc.org
Wed Mar 3 03:03:21 CET 2010
Hi Sabrina, Michael,
It doesn't work for me either with a RangedData:
> testTrack <- RangedData(
space=c("chr1", "chrY"),
ranges=IRanges(start=1:1000, width=50),
strand=c("+", "+", "-", "-"))
> library(BSgenome.Mmusculus.UCSC.mm9)
> getSeq(Mmusculus, testTrack)
Error in IRanges:::.normargSEW(start, "start") :
'start' must be a vector of integers
But you can work around this by using:
seq <- getSeq(Mmusculus, space(testTrack),
start=start(testTrack), end=end(testTrack),
strand=strand(testTrack))
See ?getSeq for more information.
BTW it seems that your installation is not up-to-date (for example
you have Biostrings 2.14.5, but the current version is 2.14.12).
I strongly recommend that you update your installation first with:
> source("http://bioconductor.org/biocLite.R")
> update.packages(repos=biocinstallRepos(), ask=FALSE)
Cheers,
H.
sabrina s wrote:
> Hi, Michael:
>
> See bellow:
>
>
> Error in IRanges:::.normargSEW(start, "start") :
> 'start' must be a vector of integers
>> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.14.2
> [3] Biostrings_2.14.5 rtracklayer_1.6.0
> [5] RCurl_1.3-1 bitops_1.0-4.1
> [7] IRanges_1.4.10
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.0 XML_2.6-0
>
>
> Thanks!
>
> Sabrina
>
>
> On Tue, Mar 2, 2010 at 8:49 AM, Michael Lawrence
> <lawrence.michael at gene.com>wrote:
>
>> Could you paste your sessionInfo() output please?
>>
>>
>> On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao at gmail.com> wrote:
>>
>>> Hi, Michael:
>>>
>>> I tried as you suggested, but I had the following error:
>>>
>>> Error in IRanges:::.normargSEW(start, "start") :
>>> 'start' must be a vector of integers
>>>
>>> my testTrack looks like this:
>>>
>>> RangedData with 76 rows and...
>>> space ranges | strand
>>> <character> <IRanges> | <factor>
>>> 1 chr1 [137159091, 137159290] | -
>>> 2 chr1 [155302742, 155302941] | -
>>> etc
>>>
>>> Any suggestions? Thanks!
>>>
>>> Sabrina
>>>
>>>
>>> On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence <
>>> lawrence.michael at gene.com> wrote:
>>>
>>>>
>>>> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao at gmail.com>wrote:
>>>>
>>>>> Hi, Steve:
>>>>> I am trying to get it worked without separating the chromosomes. I used
>>>>> the
>>>>> following code:
>>>>> seqQuery<-data.frame(
>>>>> chr=currCoorList$chr,
>>>>> start=currCoorList$starts,
>>>>> end=currCoorList$ends,
>>>>> strand=currCoorList$strand)
>>>>> mySeq<- getSeq(Mmusculus, seqQuery$chr,
>>>>> start=seqQuery$start, end=seqQuery$end,
>>>>> strand=seqQuery$strand )
>>>>>
>>>>>
>>>>> But then I got an error:
>>>>> Error in reverseComplement(DNAStringSet(seqs[ii])) :
>>>>> could not find function "copy"
>>>>>
>>>> Did I miss some depending package?
>>>>>
>>>> Don't know what's up with that, but you don't need to use a data.frame
>>>> there. You already had a RangedData above, just pass that to getSeq, i.e.
>>>>
>>>> getSeq(Mmusculus, testTrack)
>>>>
>>>> Sabrina
>>>>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou <
>>>>> mailinglist.honeypot at gmail.com> wrote:
>>>>>
>>>>>> Hi Sabrina,
>>>>>>
>>>>>> On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at gmail.com>
>>>>> wrote:
>>>>>>> Hi,all:
>>>>>>> I have a long list of coordinates, which was created by the
>>>>> following
>>>>>> code:
>>>>>>> testIranges<- IRanges(start=currCoorList $starts,
>>>>>>> end=currCoorLis$ends)
>>>>>>>
>>>>>>> testTrack<-GenomicData(testIranges,chrom=currCoorList$chr,
>>>>>>> strand=currCoorList$strand,genome="mm9")
>>>>>>>
>>>>>>> I could then export the testTrack into .bed file and upload in the
>>>>> UCSC
>>>>>>> genome browser and get the sequences of the listed coordinate
>>>>>>> But I was thinking if there is a function from rtracklayer to
>>>>> retrieve
>>>>>>> sequence directly.
>>>>>> If you're trying to get coordinates from some reference genome, you
>>>>>> can use the BSgenome.*.* packages and skip rtracklayer + internet
>>>>>> queries entirely.
>>>>>>
>>>>>> For instance, say I have an IRanges object `r1` which is a set of
>>>>>> intervals on chromosome 1 from hg18 that I want sequence information
>>>>>> for, I could do it like this:
>>>>>>
>>>>>> R> r1
>>>>>> IRanges of length 589
>>>>>> start end width
>>>>>> [1] 18016124 18016155 32
>>>>>> [2] 18020749 18020780 32
>>>>>> [3] 18024599 18024630 32
>>>>>> [4] 18024830 18024861 32
>>>>>> [5] 18051985 18052016 32
>>>>>> [6] 18064760 18064791 32
>>>>>> [7] 18088145 18088176 32
>>>>>> [8] 18088200 18088231 32
>>>>>> [9] 18110085 18110116 32
>>>>>> ...
>>>>>>
>>>>>> R> library(BSgenome.Hsapiens.UCSC.hg18)
>>>>>> R> chr1 <- unmasked(HSapiens$chr1)
>>>>>> R> myseqs <- Views(chr1, start=start(r1), end=end(r1))
>>>>>> R> myseqs
>>>>>> Views on a 247249719-letter DNAString subject
>>>>>> subject:
>>>>>>
>>>>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>>>>>> views:
>>>>>> start end width
>>>>>> [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA]
>>>>>> [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG]
>>>>>> [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC]
>>>>>> [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT]
>>>>>> [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC]
>>>>>> [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG]
>>>>>> [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA]
>>>>>> [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC]
>>>>>> [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT]
>>>>>> [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT]
>>>>>> ...
>>>>>>
>>>>>> calling "as.character" on the myseqs object will give you the sequence
>>>>>> as a character vector.
>>>>>>
>>>>>> Does that get you what you're after?
>>>>>> -steve
>>>>>>
>>>>>> --
>>>>>> Steve Lianoglou
>>>>>> Graduate Student: Computational Systems Biology
>>>>>> | Memorial Sloan-Kettering Cancer Center
>>>>>> | Weill Medical College of Cornell University
>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
>>>>> <http://cbio.mskcc.org/%7Elianos/contact>
>>>>>
>>>>>
>>>>> --
>>>>> Sabrina
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>
>>> --
>>> Sabrina
>>>
>>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list