[BioC] annotation package for chicken affyprobes

Mon Aug 28 18:25:22 CEST 2006

Hi again!

I managed to build the annotation package for chicken - thanks for all your
help! 

I was a bit surprised though by the low annotation coverage, see QC data
below. I don't really know how the data is collected but I would think more
information on chromosome location is known for the probesets. When reading
about the new chicken genome assembly (http://genome.ucsc.edu) it says that
around 95% of the sequence has been anchored to chromosomes. I thought the
annotation process in R used this information? 

What can be the reason for the very few anchored probesets? I might be doing
something wrong or perhaps it is a problem of mapping probe id's to other
identifiers? I used the genbank mappings as mybasefile and unigene and
entrez mappings as other sources. Is there a way within R to increase
annotation coverage? I am especially interested in chromosome location 
(number), but maybe this is a problem that is best solved outside R?

Would greatly appreciate some help!

Thank you, 
Lina


=======================================================================
QC data:
Number of probes: 38535
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
         chickenACCNUM found 25654 of 38535
         chickenCHR found 9707 of 38535
         chickenCHRLOC found 156 of 38535
         chickenENZYME found 52 of 38535
         chickenGENENAME found 0 of 38535
         chickenGO found 4224 of 38535
         chickenLOCUSID found 9722 of 38535
         chickenMAP found 0 of 38535
         chickenPATH found 87 of 38535
         chickenPMID found 283 of 38535
         chickenREFSEQ found 9709 of 38535
         chickenSUMFUNC found 0 of 38535
         chickenSYMBOL found 9722 of 38535
         chickenUNIGENE found 289 of 38535
Mappings found for non-probe based rda files:
         chickenENZYME2PROBE found 33
         chickenGO2ALLPROBES found 1785
         chickenGO2PROBE found 930
         chickenORGANISM found 1
         chickenPATH2PROBE found 31
         chickenPFAM found 7418
         chickenPMID2PROBE found 101
         chickenPROSITE found 5490
========================================================================== 

-----Ursprungligt meddelande-----
Från: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] För Nianhua Li
Skickat: den 17 augusti 2006 03:15
Till: bioconductor at stat.math.ethz.ch
Ämne: Re: [BioC] annotation package for chicken affyprobes

Hi, Lina,

We are lucky in case of chicken. I just updated AnnBuilder (v1.11.7) to
support Gallus gallus (taxon id 9031). You can get it from svn right
away or wait 2 days to download it from bioc website. Or if you want to
try it now, here is the changes:

================================================================

--- IPI.R       (new)
+++ IPI.R       (old)
@@ -21,8 +21,7 @@
     speciesNorganismTable <- rbind(
                                    c("human", "Homo sapiens"),
                                    c("mouse", "Mus musculus"),
-                                   c("rat", "Rattus norvegicus"),
-                                  c("chick", "Gallus gallus")
+                                   c("rat", "Rattus norvegicus")
                                    )
     colnames(speciesNorganismTable) <- c("species", "organism")
     return(speciesNorganismTable)
=================================================================
The above change happens in function "speciesNorganism". This allows you
get annotation for PFAM and PROSITE.

=================================================================
--- getSrcUrl.R (new)
+++ getSrcUrl.R (old)
@@ -65,7 +65,6 @@
                   "DANIO RERIO" = "Danio_Rerio",
                   "CAENORHABDITIS ELEGANS" = "Caenorhabditis_elegans",
                   "DROSOPHILA MELANOGASTER" = "Drosophila_melanogaster",
-                 "GALLUS GALLUS" = "Gallus_gallus",
                   NA)
     if(is.na(key)) {
         warning(paste("Organism", organism, "is not supported by
GoldenPath (GP)."))
@@ -170,7 +169,6 @@
       Sma = "Schistosoma mansoni", Ssa = "Salmo salar",
       Ssc = "Sus scrofa", Str = "Xenopus tropicalis",
       Xl = "Xenopus laevis", At = "Arabidopsis thaliana",
-      Gga = "Gallus gallus",
       Gma = "Glycine max", Han = "Helianthus annus",
       Hv = "Hordeum vulgare",  Lsa = " Lactuca sativa",
       Les = "Lycopersicon esculentum", Lco = "Lotus corniculatus",
==================================================================
The first change is in function "getUCSCUrl", for chromosome location.
The second is in function "UGSciNames" for UniGene.

I test it with this script:
===============================
library(AnnBuilder)
mypkg <- function(pkgPath, version) {
    ABPkgBuilder(baseName="mybase.txt",
                 baseMapType="ll",
                 pkgName="mypkg",
                 pkgPath=pkgPath,
                 organism="Gallus gallus",
                 version=version,
                 author=list(
                   authors="Nianhua Li",
                   maintainer="Nianhua Li<email at email.org>"
                   )
                 )
}
mypkg(getwd(), "1.0.0")
===============================

mybase.txt is
1       395929
2       395844
3       396017
4       415357
5       424377

QC data is:
Number of probes: 5
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
         mypkgACCNUM found 0 of 5
         mypkgCHRLOC found 4 of 5
         mypkgCHR found 5 of 5
         mypkgENZYME found 4 of 5
         mypkgGENENAME found 0 of 5
         mypkgGO found 5 of 5
         mypkgLOCUSID found 5 of 5
         mypkgMAP found 0 of 5
         mypkgPATH found 5 of 5
         mypkgPMID found 3 of 5
         mypkgREFSEQ found 5 of 5
         mypkgSUMFUNC found 0 of 5
         mypkgSYMBOL found 5 of 5
         mypkgUNIGENE found 5 of 5
Mappings found for non-probe based rda files:
         mypkgENZYME2PROBE found 5
         mypkgGO2ALLPROBES found 107
         mypkgGO2PROBE found 19
         mypkgORGANISM found 1
         mypkgPATH2PROBE found 18
         mypkgPFAM found 5
         mypkgPMID2PROBE found 4
         mypkgPROSITE found 3

Let me know if you have any questions or concerns. Cheers!

nianhua

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor