[BioC] metadata for Affymetrix Poplar array
Nianhua Li
nli at fhcrc.org
Fri Feb 23 19:13:02 CET 2007
> Hi, Dick,
>
> Here are some additional infomation:
>
> You can extract probeset-to-EntrezGene mapping from affymetrix's
> annotation file, give it as "otherSrc" and feed to ABPkgBuilder:
>
>> ABPkgBuilder(baseName="affy_poplar_GeneBank_for_AnnBuilder.txt",
>> baseMapType="gbNRef",
>> pkgName="poplar",
>> pkgPath=".",
>> organism="Populus trichocarpa",
>> version="1.12.0",
>> otherSrc=c(
>> EG= "affy_poplar_Entrez_for_AnnBuilder.txt"),
>> author=list(
>> authors="Dick Beyer",
>> maintainer="Dick Beyer..."
>> )
>> )
>
> AnnBuilder will use GenBank mapping as the primary source to find
> Entrez Gene mappings for the probesets. If any probeset doesn't have
> mappings, AnnBuilder will use the file given as "otherSrc" as a
> supplement. So you can get better annotation coverage.
>
>> I am not sure if this whole approach will ultimately be correct as the
>> Affy poplar array has 13 different Populus species on it, with Populus
>> trichocarpa only one of them.
>
> This won't be a big problem in your case. AnnBuilder extracts
> annotations from Entrez Gene by using Entrez Gene IDs, not taxonomy
> IDs. The organism argument will only affect the following annotations:
> pathway from KEGG, PROSITE and PFAM cross-reference from IPI, and
> chromosome location from UCSC Genome. Neither IPI or UCSC support any
> Populus species. KEGG supports Populus tremula (aspen) (EST) (eptp)
> and Populus balsamifera (poplar) (EST) (epba), but only have
> gene-pathway mappings for epba. The mapping is for ESTs, not for gene,
> so may not match any Entrez Gene IDs at all. If you want to use this
> mapping, give "Populus balsamifera (poplar) (EST)" as organism. I am
> not sure whether you need the whole string or just the Latin name
> part. But then it will conflict with UniGene, because UniGene only
> supports Populus_trichocarpa and
> Populus_tremula_x_Populus_tremuloides. UniGene is less important. It
> is only used as a supplemental source for probeset to Entrez Gene
> mapping. If you give probeset-to-EntrezGene mapping as the baseName
> and set baseMapType as ll, you can bypass UniGene.
>
> To summary, two options:
> 1. Use the above script to invoke AnnBuilder and add
> "Populus_trichocarpa=Pth" to function "UGSciNames" in file "getSrcUrl.R".
>
> 2. Change organism to "Populus balsamifera" and use
> "probeset-to-EntrezGene" mapping as baseName and "ll" as baseMapType
>
> The bottom line is that you can get gene name, gene symbol,
> chromosome, cytogenetic band, pubmed, unigene, refseq, and entrez gene
> for your probesets.
>
> let me know if you need any help and good luck
>
> nianhua
>
>
More information about the Bioconductor
mailing list