[Bioc-devel] important changes to the BSgenome packages
Hervé Pagès
hpages at fhcrc.org
Tue Jan 14 20:24:27 CET 2014
Hi all,
Starting with BioC 2.14, the BSgenome data packages have been
modified as follow:
1. Packages like BSgenome.Hsapiens.UCSC.hg19 that used to contain
masked sequences no more contain the masks, only the
"naked" sequences.
If you want the masked sequences, you need to use one of the
new BSgenome.*.masked packages (e.g.
BSgenome.Hsapiens.UCSC.hg19.masked).
These new packages are light weight packages that contain only
the masks and re-use the sequences from the corresponding naked
BSgenome package. Use them as a regular BSgenome package.
In fact BSgenome.Hsapiens.UCSC.hg19.masked is equivalent to
the BSgenome.Hsapiens.UCSC.hg19 in BioC <= 2.13.
See available.genomes() for the list of BSgenome packages currently
available. Note that not all naked BSgenome package have a
corresponding BSgenome.*.masked package.
2. The sequences are now stored in a way that allow fast random access.
As a consequence, getSeq() is faster when extracting small portions
of the genome. Currently it's also slower when loading a full
chromosome but this might be addressed in the future.
3. The upstream sequences are deprecated. A better way to get them is
to use genes() and flank() on a Transcript object followed by
getSeq() on the BSgenome object. The deprecation message shows how
to do this.
4. For consistency with other annotation packages, the main object in
a BSgenome package now is named as the package itself e.g.
BSgenome.Hsapiens.UCSC.hg19. The old object (e.g. Hsapiens) is still
available but will be deprecated at some point.
The devel repositories also contain a new BSgenome package for the
latest release of the Human genome:
BSgenome.Hsapiens.NCBI.GRCh38
Please let me know if you have questions or concerns about this.
Thanks,
H.
More information about the Bioc-devel
mailing list