[Bioc-sig-seq] Extracting DNA sequences from BSgenome.Mmusculus.UCSC.mm9_1.3.11

Ivan Gregoretti ivangreg at gmail.com
Fri May 29 18:04:11 CEST 2009


Hi Hervé,


> With BSgenome 1.12.1 (release) and 1.13.5 (devel) you can now do:
>
>  myseqs <- data.frame(
>    chr=c("chrY", "chr1", "chr2", "chr3", "chrY", "chr3", "chr1", "chr1"),
>    start=c(NA, -40, 8510201, 4920301, 30001, 9220500, -2804, -30),
>    end=c(50, NA, 8510220, 4920330, 30011, 9220555, -2801, -11)
>  )
>
>  library(BSgenome.Mmusculus.UCSC.mm9)
>
>  > getSeq(Mmusculus, myseqs$chr, myseqs$start, myseqs$end)
>  [1] "GATCCAAAACACATTCTCCCTGGTAGCATGGACAAGCAACATTTTGGGAG"
>  [2] "TTCTGTAAAGAATTTGGTATTAAACTTAAAACTGGAATTC"
>  [3] "ACGACTATAAAAACCTTTAG"
>  [4] "CATACAATAATTGTGGGGGAACTTCAAAAC"
>  [5] "ATCTTAATCAC"
>  [6] "CAGTAGTGGCGTACACCTTTAATCCCAGCACGTGGTAGGCAGAGGCAGATGGATTT"
>  [7] "ATGA"
>  [8] "AATTTGGTATTAAACTTAAA"
>
> to extract multiple subsequences from multiple chromosomes at once.
> (Note support for NAs and negative start or end.)
>

So, getSeq is vectorised now. Great. That addresses a very common use of getSeq.


>
> Hopefully this time you won't get hit by the infamous bug you reported
> earlier (BTW anything new on that front? Were you able to reproduce it?
> Thanks).
>

Bug? Last time I was in real trouble I solved my problem with
Michael's suggestions on the use of RangedData. But that was a feature
rather than a bug. Bottom line, I stick to RangedData now because it
is relatively easy to manipulate it.

Thank you,

Ivan


Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



More information about the Bioc-sig-sequencing mailing list