[BioC] Building the tomato annotation library(Affy)
Groot, Philip de
philip.degroot at wur.nl
Fri May 11 15:58:00 CEST 2007
Hello Martin and Nianhua,
Thank you very much for solving the issues in AnnBuilder!!
I succesfully build the tomato annotation library and even improved it a little bit by adding the Affy EntrezIDs via the otherSrc section. This is my tomatoQC():
Quality control information for tomato
Date built: Created: Thu May 10 17:33:37 2007
Number of probes: 10209
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
tomatoACCNUM found 10198 of 10209
tomatoCHR found 0 of 10209
tomatoENTREZID found 1296 of 10209
tomatoENZYME found 0 of 10209
tomatoGENENAME found 1296 of 10209
tomatoMAP found 0 of 10209
tomatoPATH found 0 of 10209
tomatoPMID found 783 of 10209
tomatoREFSEQ found 2 of 10209
tomatoSYMBOL found 1296 of 10209
tomatoUNIGENE found 1296 of 10209
Mappings found for non-probe based rda files:
tomatoORGANISM found 1
tomatoPMID2PROBE found 360
which is a minimal improvement compared what was obtained first. Using unigene (instead of gb) did not improve it (on the contrary). The only problem I have now is that the GO-annotation is totally missing, whereas it is available in the Affymetrix annotation library. Furthermore, the CHRLOC environment is missing (among against other environments I guess) and this causes an inconsistency: if the information cannot be retrieved, why not include a vector with only NAs? At least, the (now missing) environments are there and (in this case: my script) won't break on it. For the moment, I solved it by checking whether tomato is being analysed or not, but including at least an empty vector is a "nicer" solution (to my opinion). And that the annotation of the tomato array is poor: well, we expected this. Thank you anyway for helping us out!
Regards,
Philip
________________________________
From: Nianhua Li [mailto:nli at fhcrc.org]
Sent: Wed 9-5-2007 20:28
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Building the tomato annotation library(Affy)
Dear Dr Philip de Groot,
Thanks for the report and sorry for the late reply. The "refLink.txt" error was
intended, but we have changed it to a more informative message. The "sort.list"
error was a bug. It happens when parsing KEGG pathway/enzyme data. The KEGG data
file usually contains both pathway and enzyme data for a given organism. But it
only has pathway data for tomato (actually ESTs only). This broke the code. We
have updated AnnBuilder. Please try the latest one in the bioc 2.1 repository or
donwload it from http://bioconductor.org/packages/2.1/bioc/html/AnnBuilder.html
A test run for Affymetrix tomato array shows that the annotation is very sparse.
Here is the QC data, just FYI:
Mappings found for probe based rda files:
tomatoACCNUM found 10198 of 10209
tomatoCHR found 0 of 10209
tomatoENTREZID found 1288 of 10209
tomatoENZYME found 0 of 10209
tomatoGENENAME found 1288 of 10209
tomatoMAP found 0 of 10209
tomatoPATH found 0 of 10209
tomatoPMID found 778 of 10209
tomatoREFSEQ found 2 of 10209
tomatoSYMBOL found 1288 of 10209
tomatoUNIGENE found 1288 of 10209
Mappings found for non-probe based rda files:
tomatoORGANISM found 1
tomatoPMID2PROBE found 359
We only used the genbank IDs from the Affymetrix csv file, just like what you
did. You can also extract the entrez IDs form the csv file and give it as
"otherSrc" to ABPkgBuilder. It may increase the annotation coverage.
good luck
Martin and Nianhua
More information about the Bioconductor
mailing list