[BioC] metadata for Affymetrix Poplar array
Nianhua Li
nli at fhcrc.org
Fri Feb 23 03:26:05 CET 2007
Hi, Dick,
AnnBuilder won't work for poplar right away. Here is a mini guide.
You can also follow this instruction to enable AnnBuilder for other
organisms.
(Dick, I am sorry but it seems almost hopeless for poplar.)
A term definition before we start:
organism name: I will use this term through out the email. The
organism name for human is "Homo sapiens". Function "ABPkgBuilder"
has an argument "organism".
So, if you want to build annotation for human genes, give "Homo
sapiens" as the value for "organism". The function will use the
argument value to find data at UCSC Genome Database, IPI, KEGG and
UniGene.
1. Make sure the organism is supported by Entrez Gene:
1.1 Search http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Taxonomy
for NCBI taxonomy id with your organism name. Poplar sp. is 3697.
1.2 Check whether the taxonomy id is included in the files at
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ . You want to check
gene2accession, gene2pubmed, gene2refseq, gene2unigene and mim2gene.
Check "README" to see if your organism is included in gene2go. Poplar
is not on the list, so you won't get GO annotation.
2. Check KEGG
Find your organism from ftp.genome.ad.jp/pub/kegg/tarfiles/genome.
make sure the organism name is consistent with the
value in field "DEFINITION" in this file. Populus sp. is not on the
list, but there are "Populus tremula" and "Populus balsamifera". (KEGG
is temporarily down right now)
3. Check UCSC Genome Database:
Go to UCSC Genome Database website
ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/ and find the
folder name corresponding to your organism. Poplar is not availabe, so
you won't get information about chromosome location (CHRLOC). If your
organism is supported, modify function "getUCSCUrl" in file
"getSrcUrl.R" to add the folder name. For example, this is for
chicken. "GALLUS GALLUS" is the organism name in upper case.
"Gallus_gallus" is the folder name.
=================================================================
--- getSrcUrl.R (new)
+++ getSrcUrl.R (old)
<at> <at> -65,7 +65,6 <at> <at>
"DANIO RERIO" = "Danio_Rerio",
"CAENORHABDITIS ELEGANS" = "Caenorhabditis_elegans",
"DROSOPHILA MELANOGASTER" = "Drosophila_melanogaster",
- "GALLUS GALLUS" = "Gallus_gallus",
NA)
if(is.na(key)) {
warning(paste("Organism", organism, "is not supported by
GoldenPath (GP)."))
==================================================================
Similarly, add the folder name to function "getPubDataGoldenPath" in
file "downloadSourceData.R".
4. Check UniGene (only necessary when you use "ug" or "gbNRef" as
baseMapType to invoke ABPkgBuilder:
Look at function "UGSciNames" in file "getSrcUrl.R", check if your
organism is on the list. If not, visit
ftp://ftp.ncbi.nih.gov/repository/UniGene, find the
folder for your organism, go inside the folder, find *.data.gz. I
can't find "Populus sp.", but there is "Populus_trichocarpa" and
"Populus_tremula_x_Populus_tremuloides". The file for
Populus_trichocarpa is "Pth.data.gz". "Pth" is the "UGSciName". Add it
to the R function. Make sure it is mapped to the organism name.
5. Check IPI:
Go to ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ and find the
folder corresponding to your organism. Poplar is not supported, so you
won't get crossreferences between gene and PFAM (and PROSITE). If your
organism is supported, modify function "speciesNorganism" in file
"IPI.R" to add your organism. For example, the mapping for human is:
c("human", "Homo sapiens"),
"human" is the folder name in all lower case. "Homo sapiens" is the
organism name.
Hope this is helpful and good luck!
nianhua
More information about the Bioconductor
mailing list