[Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?
Tim Triche, Jr.
tim.triche at gmail.com
Wed Jun 3 21:00:12 CEST 2015
It would be nice (for a number of reasons) to have chromosome lengths
readily available in a foundational package like GenomeInfoDb, so that,
say,
data(seqinfo.hg19)
seqinfo(myResults) <- seqinfo.hg19[ seqlevels(myResults) ]
would work without issues. Is there any particular reason this couldn't
happen for the supported/available BSgenomes? It would seem like a simple
matter to do
R> library(BSgenome.Hsapiens.UCSC.hg19)
R> seqinfo.hg19 <- seqinfo(Hsapiens)
R> save(seqinfo.hg19,
file="~/bioc-devel/GenomeInfoDb/data/seqinfo.hg19.rda")
and be done with it until (say) the next release or next released
BSgenome. I considered looping through the following BSgenomes myself...
and if it isn't strongly opposed by (everyone) I may still do exactly
that. Seems useful, no?
e.g. for the following 42 builds,
grep("(UCSC|NCBI)", unique(gsub(".masked", "", available.genomes())),
value=TRUE)
[1] "BSgenome.Amellifera.UCSC.apiMel2" "BSgenome.Btaurus.UCSC.bosTau3"
[3] "BSgenome.Btaurus.UCSC.bosTau4" "BSgenome.Btaurus.UCSC.bosTau6"
[5] "BSgenome.Btaurus.UCSC.bosTau8" "BSgenome.Celegans.UCSC.ce10"
[7] "BSgenome.Celegans.UCSC.ce2" "BSgenome.Celegans.UCSC.ce6"
[9] "BSgenome.Cfamiliaris.UCSC.canFam2"
"BSgenome.Cfamiliaris.UCSC.canFam3"
[11] "BSgenome.Dmelanogaster.UCSC.dm2" "BSgenome.Dmelanogaster.UCSC.dm3"
[13] "BSgenome.Dmelanogaster.UCSC.dm6" "BSgenome.Drerio.UCSC.danRer5"
[15] "BSgenome.Drerio.UCSC.danRer6" "BSgenome.Drerio.UCSC.danRer7"
[17] "BSgenome.Ecoli.NCBI.20080805"
"BSgenome.Gaculeatus.UCSC.gasAcu1"
[19] "BSgenome.Ggallus.UCSC.galGal3" "BSgenome.Ggallus.UCSC.galGal4"
[21] "BSgenome.Hsapiens.NCBI.GRCh38" "BSgenome.Hsapiens.UCSC.hg17"
[23] "BSgenome.Hsapiens.UCSC.hg18" "BSgenome.Hsapiens.UCSC.hg19"
[25] "BSgenome.Hsapiens.UCSC.hg38" "BSgenome.Mfascicularis.NCBI.5.0"
[27] "BSgenome.Mfuro.UCSC.musFur1" "BSgenome.Mmulatta.UCSC.rheMac2"
[29] "BSgenome.Mmulatta.UCSC.rheMac3" "BSgenome.Mmusculus.UCSC.mm10"
[31] "BSgenome.Mmusculus.UCSC.mm8" "BSgenome.Mmusculus.UCSC.mm9"
[33] "BSgenome.Ptroglodytes.UCSC.panTro2"
"BSgenome.Ptroglodytes.UCSC.panTro3"
[35] "BSgenome.Rnorvegicus.UCSC.rn4" "BSgenome.Rnorvegicus.UCSC.rn5"
[37] "BSgenome.Rnorvegicus.UCSC.rn6"
"BSgenome.Scerevisiae.UCSC.sacCer1"
[39] "BSgenome.Scerevisiae.UCSC.sacCer2"
"BSgenome.Scerevisiae.UCSC.sacCer3"
[41] "BSgenome.Sscrofa.UCSC.susScr3" "BSgenome.Tguttata.UCSC.taeGut1"
Am I insane for suggesting this? It would make things a little easier for
rtracklayer, most SummarizedExperiment and SE-derived objects, blah, blah,
blah...
Best,
--t
Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list