[BioC] making use of the Apis mellifera BeeBase assembly 4 data in goseq

Hervé Pagès hpages at fhcrc.org
Fri Feb 24 21:51:37 CET 2012


Hello Vanessa,

On 02/24/2012 10:45 AM, Corby, Vanessa wrote:
> Hello Herve and Matt,
>
> After looking through the Bioconductor documentation for the BeeBase
> assembly 4 package Hervé posted (information on the Apis 4 annotation
> stored in Biostrings objects), the documentation for the org.Hs.eg.db
> Annotation database documentation, the bioconductor mailing list, the
> BSgenome documentation, and the goseq documentation, I am still very
> confused about whether I can use the assembly 4 package that Hervé
> posted in goseq.

Just to clarify, goseq is not my package so I can't "post" anything
in it, whatever that means. I assume you are talking about the
BSgenome.Amellifera.BeeBase.assembly4 package that I made and that
is part of Bioconductor.

> The reason that I want to use the assembly 4 data is
> that I would presume that it will have more current information than the
> natively supported (by goseq) Apis release 2.

It's a more recent assembly so I would expect it to be more accurate
(i.e. closer to reality).

>
> So, here are my questions:
>
> 1.Will release 4 offer much improvement over release 2? If this is not
> the case, then the next two questions are moot.

It's just a more recent assembly, with all what that implies.

>
> 2.Do I need to get information on the transcript lengths and the
> associations between the geneids and GO terms for the Apis 4 release and
> build 2 new files of this information for goseq to use?

I'm not familiar with the goseq package so I'll let Matt answer this.

> Is that
> information available (perhaps through UCSC or Baylor’s site for the
> honeybee projects)? Can I use Bioconductor for this if I have the
> annotation database file Hervé posted?

The BSgenome.Amellifera.BeeBase.assembly4 package only contains the
DNA sequences of Apis 4 release. It does *not* contain annotations
for this assembly.

One advantage of using the BSgenome.Amellifera.UCSC.apiMel2 package
instead is that you have an easy access to a world of annotations for
this genome thru the UCSC genome browser. Too bad that the UCSC folks
have not plans to support apiMel4:

   https://lists.soe.ucsc.edu/pipermail/genome/2007-October/014763.html

apiMel2 is 7 year old now!

Note that the GenomicFeatures and rtracklayer packages make it really
simple to import and work with those annotations in R/Bioconductor.

>
> 3.Do I just have to rename the Apis 4 genome package that Hervé posted
> in order to use it in goseq (I see that there are several naming
> conventions on the Annotation Data packages)?

I'll let Matt answer this.

>
> You can see that some of these questions are more appropriate for Hervé
> and some for Matt, so I decided to email both of you. Some of these
> issues arise simply because I’ve only been successful with the example
> in the goseq documentation (using the org.Hs.eg.db Annotation database).
> Others arise because I am just very new to R and the Bioconductor packages.

For what is worth, I don't think there is any org.* package for Bee
(would probably be named something like org.Am.eg.db if there was one).
And if there was one, you would need to double-check that the
annotations in it are actually compatible with whatever genome assembly
you finally decided to use.

>
> Thanks for any help you can offer. And apologies if this is the 100^th
> time you’ve received an email about this from newbies such as myself.

No problem. Wish I could help more. I'm cc'ing the Bioconductor mailing
list (hope you don't mind). It's a better place to ask questions like
this as other people might be able to help and also the whole
discussion will be archived and searchable for further reference.

Cheers,
H.

>
> Vanessa Corby-Harris
>
> Research Molecular Biologist
>
> USDA-ARS
>
> Carl Hayden Bee Research Center
>
> 2000 E. Allen Rd., Tucson, AZ 85719
>
> (520) 647-9269
>
> This electronic message contains information generated by the USDA
> solely for the intended recipients. Any unauthorized interception of
> this message or the use or disclosure of the information it contains may
> violate the law and subject the violator to civil or criminal penalties.
> If you believe you have received this message in error, please notify
> the sender and delete the email immediately.


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list