[BioC] from RefSeq GI protein identifiers to GO terms
Sean Davis
sdavis2 at mail.nih.gov
Tue Jun 12 11:59:55 CEST 2007
Lina Hultin-Rosenberg wrote:
> Dear list,
>
> This might be a question that has been discussed previously but I could not
> find any good solution for it. I have lists of human proteins from various
> proteomics studies that I want to compare with regards to the GO terms
> associated to them. I have the RefSeq GI protein id for the proteins and my
> questions is how I best map those to other identifiers that I can use in
> subsequent GO analysis?
>
> It might be that this problem is solved best outside R but maybe someone
> still can give me a hint to the best solution. For me this is a problem that
> comes up quite often - the need to map between different identifiers - and I
> have not yet find any really good solution to it. If I for example use IPI I
> always loose some proteins/genes since the coverage is rather bad, but maybe
> there is no solution that will give perfect mapping?!
The file located here:
ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz
and described in detail here:
ftp://ftp.ncbi.nih.gov/gene/DATA/README
maps refseq to Entrez Gene ID. Once you have the Entrez Gene ID, you
can use the bioconductor annotation packages to get GO mappings. The
file above is a tab-delimited text file, so you should be able to read
it into R and do the matching by GI number rather easily.
Hope that helps.
Sean
More information about the Bioconductor
mailing list