[BioC] from RefSeq GI protein identifiers to GO terms

Sean Davis sdavis2 at mail.nih.gov
Tue Jun 12 11:59:55 CEST 2007


Lina Hultin-Rosenberg wrote:
> Dear list,
> 
> This might be a question that has been discussed previously but I could not
> find any good solution for it. I have lists of human proteins from various
> proteomics studies that I want to compare with regards to the GO terms
> associated to them. I have the RefSeq GI protein id for the proteins and my
> questions is how I best map those to other identifiers that I can use in
> subsequent GO analysis? 
> 
> It might be that this problem is solved best outside R but maybe someone
> still can give me a hint to the best solution. For me this is a problem that
> comes up quite often - the need to map between different identifiers - and I
> have not yet find any really good solution to it. If I for example use IPI I
> always loose some proteins/genes since the coverage is rather bad, but maybe
> there is no solution that will give perfect mapping?!

The file located here:

ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz

and described in detail here:

ftp://ftp.ncbi.nih.gov/gene/DATA/README

maps refseq to Entrez Gene ID.  Once you have the Entrez Gene ID, you
can use the bioconductor annotation packages to get GO mappings.  The
file above is a tab-delimited text file, so you should be able to read
it into R and do the matching by GI number rather easily.

Hope that helps.

Sean



More information about the Bioconductor mailing list