[Bioc-devel] important changes to the BSgenome packages

Hervé Pagès hpages at fhcrc.org
Tue Jan 14 20:24:27 CET 2014


Hi all,

Starting with BioC 2.14, the BSgenome data packages have been
modified as follow:

  1. Packages like BSgenome.Hsapiens.UCSC.hg19 that used to contain
     masked sequences no more contain the masks, only the
     "naked" sequences.
     If you want the masked sequences, you need to use one of the
     new BSgenome.*.masked packages (e.g. 
BSgenome.Hsapiens.UCSC.hg19.masked).
     These new packages are light weight packages that contain only
     the masks and re-use the sequences from the corresponding naked
     BSgenome package. Use them as a regular BSgenome package.
     In fact BSgenome.Hsapiens.UCSC.hg19.masked is equivalent to
     the BSgenome.Hsapiens.UCSC.hg19 in BioC <= 2.13.

     See available.genomes() for the list of BSgenome packages currently
     available. Note that not all naked BSgenome package have a
     corresponding BSgenome.*.masked package.

  2. The sequences are now stored in a way that allow fast random access.
     As a consequence, getSeq() is faster when extracting small portions
     of the genome. Currently it's also slower when loading a full
     chromosome but this might be addressed in the future.

  3. The upstream sequences are deprecated. A better way to get them is
     to use genes() and flank() on a Transcript object followed by
     getSeq() on the BSgenome object. The deprecation message shows how
     to do this.

  4. For consistency with other annotation packages, the main object in
     a BSgenome package now is named as the package itself e.g.
     BSgenome.Hsapiens.UCSC.hg19. The old object (e.g. Hsapiens) is still
     available but will be deprecated at some point.

The devel repositories also contain a new BSgenome package for the
latest release of the Human genome:

   BSgenome.Hsapiens.NCBI.GRCh38

Please let me know if you have questions or concerns about this.

Thanks,
H.



More information about the Bioc-devel mailing list