[BioC] BSgenome package for a. thaliana

Hervé Pagès hpages at fhcrc.org
Wed Dec 15 01:02:05 CET 2010


Hi Oleg,

I finally managed to make a BSgenome package for TAIR9.

Release notes for BSgenome.Athaliana.TAIR.TAIR9:

   - TAIR9 and TAIR10 correspond to the same genome assembly
     so there is no need for a BSgenome pkg for TAIR10 :-)

   - Sequences in TAIR9 are named Chr1, Chr2, ..., ChrM, ChrC
     instead of chr1, chr2, ..., chrM, chrC in previous BSgenome
     pkg (i.e. in BSgenome.Athaliana.TAIR.04232008).
     The problem is: there are at least 3 different naming
     conventions used concurrently at TAIR. Even within the same
     genome release, chromosomes are not named the same way in
     FASTA files, GFF files, filenames, etc...  However, Chr1,
     Chr2, ..., ChrM, ChrC seem to be the most widely used
     sequence naming convention (at least that's what seems to
     be used in the GFF files I looked at).

   - Sequences have been slightly redordered in the BSgenome pkg
     for TAIR9: now ChrM preceeds ChrC (this is to be more consistent
     with BSgenome pkgs for other organisms).

   - Seqlengths:

      > library(BSgenome.Athaliana.TAIR.TAIR9)
      > seqlengths(Athaliana)
          Chr1     Chr2     Chr3     Chr4     Chr5     ChrM     ChrC
      30427671 19698289 23459830 18585056 26975502   366924   154478

   - Sequences Chr1 to Chr5 have changed in TAIR9 with respect to
     previous BSgenome pkg but not the ChrM and ChrC sequences.

   - The new package still doesn't contain any built-in masks (the
     locations/sizes of the assembly gaps provided by TAIR seem to
     be wrong as they don't correspond to the N-blocks found in the
     sequences).

BSgenome.Athaliana.TAIR.TAIR9 will become available in release and
devel in about 1 hour (source packages only for now).

Also, this is the end of life for BSgenome.Athaliana.TAIR.01222004
and I will drop it from devel.

Please let me know if you have any questions.

Cheers,
H.


On 12/07/2010 10:42 AM, Hervé Pagès wrote:
> Hi Oleg,
>
> On 12/06/2010 07:25 PM, oleg at stat.berkeley.edu wrote:
>> Hi, all
>> I want to use genome package corresponding to TAIR9 version of a.thaliana
>> genome. It seems that BSgenome makes 2 genome versions available:
>> "BSgenome.Athaliana.TAIR.01222004" and
>> "BSgenome.Athaliana.TAIR.04232008".
>> After checking them out, they actually seem to be the same and represent
>> an earlier version of the genome (TAIR8?).
>
> They are not the same:
>
>  > alphabetFrequency(BSgenome.Athaliana.TAIR.01222004::Athaliana$chr1)
> A C G T M R W S Y K
> 9711178 5436538 5422303 9698578 76 37 124 31 85 53
> V H D B N - +
> 0 0 0 0 163560 0 0
>  > alphabetFrequency(BSgenome.Athaliana.TAIR.04232008::Athaliana$chr1)
> A C G T M R W S Y K
> 9709677 5435365 5421130 9697107 76 36 124 30 82 53
> V H D B N - +
> 0 0 0 0 168883 0 0
>
>
>> I could probably try to put
>> together TAIR9 genome using BSgenome manual, but I thought there might be
>> a package out there already, since TAIR9 has been around for a while now
>> (TAIR10 has been released last month). If someone knows of one, please
>> let
>> me know!
>
> We'll take car of making those 2. As you pointed out, the ones we have
> are pretty old now and we really need to provide something more recent.
> I'll post back here when BSgenomes for TAIR9 and TAIR10 are available.
>
> Cheers,
> H.
>
>>
>> Oleg.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list