[BioC] GEOquery and different types of GPL annotation files

Peter bioconductor-mailinglist at maubp.freeserve.co.uk
Sat Jan 21 13:36:28 CET 2006


Sean Davis wrote:
> The Annotation Soft files are built by GEO staff when they build a GEO
> dataset.  They use whatever public identifier they can in the submitted GPL
> to do lookups on their own of what the features on the array represent.
> They are NOT available for every GPL, only those that are attached to a GDS.
> They do not necessarily agree with the original submitted GPL.  They are not
> currently handled by GEOquery.  However, as you noted in another post,
> Peter, the original GPLs as submitted by users are often larger than those
> built by GEO, so I haven't found a strong reason to work with the Annotation
> Soft files.  In fact, I typically use the GPL information only for lookup of
> some primary key  (genbank accession, affy id, or something like that) and
> then build the annotation myself (or use a bioconductor annotation package),
> as the methods used to generate annotation can be quite varied and the time
> since last update (in the case of GPLs, never updated) is important.
> 
> Hope that helps clarify things a bit.

A bit - but with two different GPL files its a little tricky following 
which one you mean at each point.

Does the GEO team have some official terms for the two types?

I'm a little unclear which are the "Annotation Soft files are built by 
GEO staff" and which are the "original GPLs as submitted by users", but 
I think I have worked it out:

The GPL96 via the website (the 12MB file) does have GO terms, plus a 
list of experiments using the platform (GSM and GSE references):

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full

I'm guessing this is the one built by GEO staff (due to the GSM and GSE 
references).

In the case of GPL96, the smaller 3MB file from the FTP site 
(GPL96.annot.gz) seems to have a lot of useful cross references (but no 
GO terms):

ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/by_platform/annot/GPL96.annot.gz

Is this therefore based on the data Affy submitted to describe their 
human chip?

For my basic exploration of GEO files and microarray analysis, this 
smaller file is actually more useful - but its not supported by 
GEOquery.  Is adding support for these style GPL files likely to be a 
"big job" do you think?

I do take your point that using a bioconductor annotation package may be 
less hassle (I don't care to build my own annotation yet).

Thank you

Peter



More information about the Bioconductor mailing list