[BioC] Fetch scaffold sequences from BSgenome package
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Tue Sep 18 21:02:01 CEST 2012
Herve,
Thanks so much for the quick response and the alternative solutions!
Best regards,
Julie
On 9/18/12 2:49 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
> Hi Julie,
>
> On 09/18/2012 09:05 AM, Zhu, Lihua (Julie) wrote:
>> Herve,
>>
>> It seems that the getSeq function does not work for scaffold sequence
>> such as Zv9_scaffold3564:93,507-93,556 and Zv9_NA384:3,507-3,556 for
>> BSgenome.Drerio.UCSC.danRer7_1.3.17. What is the best way to obtain such
>> sequences using BSgenome package? Many thanks!
>
> Indeed:
>
>> library(BSgenome.Drerio.UCSC.danRer7)
>> getSeq(Drerio, "Zv9_scaffold3564")
> Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i],
> start[i], :
> sequence Zv9_scaffold3564 found more than once, please use a
> non-ambiguous name
>
> The problem is this:
>
>> grep("Zv9_scaffold3564", names(Drerio$Zv9_scaffold), value=TRUE)
> [1] "Zv9_scaffold3564"
>> grep("Zv9_scaffold3564", names(Drerio$upstream1000), value=TRUE)
> [1] "ENSDART00000099775_up_1000_Zv9_scaffold3564_137113_r
> Zv9_scaffold3564:137113-138112"
> [2] "ENSDART00000062791_up_1000_Zv9_scaffold3564_129501_r
> Zv9_scaffold3564:129501-130500"
>
> and also that getSeq() here is looking for a sequence that *contains*
> "Zv9_scaffold3564" in its name.
>
> Note that when used with a GRanges object, getSeq() is looking for a
> sequence with the exact specified name so that should address the
> problem:
>
>> getSeq(Drerio, GRanges("Zv9_scaffold3564", IRanges(3507, 3556)))
> A DNAStringSet instance of length 1
> width seq
> [1] 50 CCTAAGTATCCACTTTAGTATCCATAACACAATAATCAGATGCTATTGTT
>
> Note that our plan for BioC 2.12 is to get rid of the upstream sequences
> in the BSgenome packages (they should never have been included here in
> the first place, people will be able to use Paul's new getPromoterSeq()
> from the GenomicFeatures package instead) so that will entirely solve
> the ambiguous name problem.
>
> Let me know if you have any questions about this.
>
> Thanks,
> H.
>
>>
>> Best regards,
>>
>> Julie
More information about the Bioconductor
mailing list