[BioC] Building the tomato annotation library(Affy)

Fri May 11 15:58:00 CEST 2007

Hello Martin and Nianhua,

Thank you very much for solving the issues in AnnBuilder!!
I succesfully build the tomato annotation library and even improved it a little bit by adding the Affy EntrezIDs via the otherSrc section. This is my tomatoQC():

Quality control information for  tomato
Date built: Created: Thu May 10 17:33:37 2007
Number of probes: 10209
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
         tomatoACCNUM found 10198 of 10209
         tomatoCHR found 0 of 10209
         tomatoENTREZID found 1296 of 10209
         tomatoENZYME found 0 of 10209
         tomatoGENENAME found 1296 of 10209
         tomatoMAP found 0 of 10209
         tomatoPATH found 0 of 10209
         tomatoPMID found 783 of 10209
         tomatoREFSEQ found 2 of 10209
         tomatoSYMBOL found 1296 of 10209
         tomatoUNIGENE found 1296 of 10209
Mappings found for non-probe based rda files:
         tomatoORGANISM found 1
         tomatoPMID2PROBE found 360

which is a minimal improvement compared what was obtained first. Using unigene (instead of gb) did not improve it (on the contrary). The only problem I have now is that the GO-annotation is totally missing, whereas it is available in the Affymetrix annotation library. Furthermore, the CHRLOC environment is missing (among against other environments I guess) and this causes an inconsistency: if the information cannot be retrieved, why not include a vector with only NAs? At least, the (now missing) environments are there and (in this case: my script) won't break on it. For the moment, I solved it by checking whether tomato is being analysed or not, but including at least an empty vector is a "nicer" solution (to my opinion). And that the annotation of the tomato array is poor: well, we expected this. Thank you anyway for helping us out!

Regards,

Philip

________________________________

From: Nianhua Li [mailto:nli at fhcrc.org]
Sent: Wed 9-5-2007 20:28
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Building the tomato annotation library(Affy)

Dear Dr Philip de Groot,

Thanks for the report and sorry for the late reply. The "refLink.txt" error was
intended, but we have changed it to a more informative message. The "sort.list"
error was a bug. It happens when parsing KEGG pathway/enzyme data. The KEGG data
 file usually contains both pathway and enzyme data for a given organism. But it
only has pathway data for tomato (actually ESTs only). This broke the code. We
have updated AnnBuilder. Please try the latest one in the bioc 2.1 repository or
donwload it from http://bioconductor.org/packages/2.1/bioc/html/AnnBuilder.html

A test run for Affymetrix tomato array shows that the annotation is very sparse.
Here is the QC data, just FYI:
Mappings found for probe based rda files:
         tomatoACCNUM found 10198 of 10209
         tomatoCHR found 0 of 10209
         tomatoENTREZID found 1288 of 10209
         tomatoENZYME found 0 of 10209
         tomatoGENENAME found 1288 of 10209
         tomatoMAP found 0 of 10209
         tomatoPATH found 0 of 10209
         tomatoPMID found 778 of 10209
         tomatoREFSEQ found 2 of 10209
         tomatoSYMBOL found 1288 of 10209
         tomatoUNIGENE found 1288 of 10209
Mappings found for non-probe based rda files:
         tomatoORGANISM found 1
         tomatoPMID2PROBE found 359

We only used the genbank IDs from the Affymetrix csv file, just like what you
did. You can also extract the entrez IDs form the csv file and give it as
"otherSrc" to ABPkgBuilder. It may increase the annotation coverage.

good luck

Martin and Nianhua