[Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

Tim Triche, Jr. tim.triche at gmail.com
Wed Jun 3 21:00:12 CEST 2015


It would be nice (for a number of reasons) to have chromosome lengths
readily available in a foundational package like GenomeInfoDb, so that,
say,

data(seqinfo.hg19)
seqinfo(myResults) <- seqinfo.hg19[ seqlevels(myResults) ]

would work without issues.  Is there any particular reason this couldn't
happen for the supported/available BSgenomes?  It would seem like a simple
matter to do

R> library(BSgenome.Hsapiens.UCSC.hg19)
R> seqinfo.hg19 <- seqinfo(Hsapiens)
R> save(seqinfo.hg19,
file="~/bioc-devel/GenomeInfoDb/data/seqinfo.hg19.rda")

and be done with it until (say) the next release or next released
BSgenome.  I considered looping through the following BSgenomes myself...
and if it isn't strongly opposed by (everyone) I may still do exactly
that.  Seems useful, no?

e.g. for the following 42 builds,

grep("(UCSC|NCBI)", unique(gsub(".masked", "", available.genomes())),
value=TRUE)
 [1] "BSgenome.Amellifera.UCSC.apiMel2"   "BSgenome.Btaurus.UCSC.bosTau3"

 [3] "BSgenome.Btaurus.UCSC.bosTau4"      "BSgenome.Btaurus.UCSC.bosTau6"

 [5] "BSgenome.Btaurus.UCSC.bosTau8"      "BSgenome.Celegans.UCSC.ce10"

 [7] "BSgenome.Celegans.UCSC.ce2"         "BSgenome.Celegans.UCSC.ce6"

 [9] "BSgenome.Cfamiliaris.UCSC.canFam2"
 "BSgenome.Cfamiliaris.UCSC.canFam3"
[11] "BSgenome.Dmelanogaster.UCSC.dm2"    "BSgenome.Dmelanogaster.UCSC.dm3"

[13] "BSgenome.Dmelanogaster.UCSC.dm6"    "BSgenome.Drerio.UCSC.danRer5"

[15] "BSgenome.Drerio.UCSC.danRer6"       "BSgenome.Drerio.UCSC.danRer7"

[17] "BSgenome.Ecoli.NCBI.20080805"
"BSgenome.Gaculeatus.UCSC.gasAcu1"
[19] "BSgenome.Ggallus.UCSC.galGal3"      "BSgenome.Ggallus.UCSC.galGal4"

[21] "BSgenome.Hsapiens.NCBI.GRCh38"      "BSgenome.Hsapiens.UCSC.hg17"

[23] "BSgenome.Hsapiens.UCSC.hg18"        "BSgenome.Hsapiens.UCSC.hg19"

[25] "BSgenome.Hsapiens.UCSC.hg38"        "BSgenome.Mfascicularis.NCBI.5.0"

[27] "BSgenome.Mfuro.UCSC.musFur1"        "BSgenome.Mmulatta.UCSC.rheMac2"

[29] "BSgenome.Mmulatta.UCSC.rheMac3"     "BSgenome.Mmusculus.UCSC.mm10"

[31] "BSgenome.Mmusculus.UCSC.mm8"        "BSgenome.Mmusculus.UCSC.mm9"

[33] "BSgenome.Ptroglodytes.UCSC.panTro2"
"BSgenome.Ptroglodytes.UCSC.panTro3"
[35] "BSgenome.Rnorvegicus.UCSC.rn4"      "BSgenome.Rnorvegicus.UCSC.rn5"

[37] "BSgenome.Rnorvegicus.UCSC.rn6"
 "BSgenome.Scerevisiae.UCSC.sacCer1"
[39] "BSgenome.Scerevisiae.UCSC.sacCer2"
 "BSgenome.Scerevisiae.UCSC.sacCer3"
[41] "BSgenome.Sscrofa.UCSC.susScr3"      "BSgenome.Tguttata.UCSC.taeGut1"




Am I insane for suggesting this?  It would make things a little easier for
rtracklayer, most SummarizedExperiment and SE-derived objects, blah, blah,
blah...


Best,

--t




Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list