[BioC] IPI to entrez id
Samuel GRANJEAUD - IR/ICIM
granjeau at tagc.univ-mrs.fr
Tue Feb 22 09:44:03 CET 2011
Hi,
Looks like all your id have been suppressed from IPI. You can still find
them in UniPARC. If your list is only 10 items, you'd better do the
geneid retrieval by hand. http://www.ebi.ac.uk/uniparc/ list a few links
to query UNIPARC.
Here is the code I used to check your Id. Sorry, there is some ugly
perl, first to reformat XML than to extract interesting fields.
http://www.xaprb.com/blog/2006/10/05/five-great-perl-programming-techniques-to-make-your-life-fun-again/
Regards.
~$ cat myId.txt
IPI00055954
IPI00221338
IPI00465149
IPI00554793
IPI00028262
IPI00412977
IPI00105532
IPI00411514
IPI00746388
IPI00419266
~$ wget
"http://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=uniparc&id=IPI00055954+IPI00221338+IPI00465149+IPI00554793+IPI00028262+IPI00412977+IPI00105532+IPI00411514+IPI00746388+IPI00419266&format=default&style=default&Retrieve=Retrieve"
-O myIPI.xml
:~$ perl -pe 's/>\s+<dbRef/>\n<dbRef/g; s/\/>/\/>\n/g; s/></>\n</g'
myIPI.xml | grep 'type="IPI"' | perl -ane '%t = map { split(/=/, $_) }
grep { m/=/ } split(/ |\/>/, $_); print join(" - ", map($t{$_}, qw(id
version created last active))),"\n"' | grep -f myId.txt | sort
"IPI00028262" - "1" - "2003-03-14" - "2009-06-17" - "N"
"IPI00055954" - "1" - "2003-03-14" - "2003-10-03" - "N"
"IPI00055954" - "2" - "2003-11-07" - "2007-07-14" - "N"
"IPI00055954" - "3" - "2007-08-08" - "2007-10-24" - "N"
"IPI00055954" - "4" - "2007-11-13" - "2009-07-09" - "N"
"IPI00105532" - "1" - "2003-03-14" - "2006-03-03" - "N"
"IPI00105532" - "2" - "2006-04-04" - "2006-10-06" - "N"
"IPI00105532" - "3" - "2006-11-02" - "2009-09-03" - "N"
"IPI00221338" - "1" - "2003-04-10" - "2003-10-03" - "N"
"IPI00221338" - "2" - "2003-11-07" - "2005-02-05" - "N"
"IPI00221338" - "3" - "2005-03-07" - "2005-08-02" - "N"
"IPI00221338" - "4" - "2005-09-06" - "2006-09-06" - "N"
"IPI00221338" - "5" - "2006-10-06" - "2006-10-06" - "N"
"IPI00221338" - "6" - "2006-11-02" - "2007-10-24" - "N"
"IPI00221338" - "7" - "2007-11-13" - "2008-09-02" - "N"
"IPI00221338" - "8" - "2008-09-25" - "2009-06-17" - "N"
"IPI00411514" - "1" - "2004-06-02" - "2007-12-05" - "N"
"IPI00411514" - "2" - "2008-01-16" - "2010-06-17" - "N"
"IPI00412977" - "1" - "2004-06-02" - "2010-04-26" - "N"
"IPI00419266" - "1" - "2004-07-01" - "2010-04-26" - "N"
"IPI00465149" - "1" - "2004-10-04" - "2005-02-05" - "N"
"IPI00465149" - "2" - "2005-03-07" - "2005-05-10" - "N"
"IPI00465149" - "3" - "2005-06-03" - "2009-07-09" - "N"
"IPI00554793" - "1" - "2005-04-04" - "2007-01-17" - "N"
"IPI00554793" - "2" - "2007-02-21" - "2009-06-17" - "N"
"IPI00746388" - "1" - "2006-05-16" - "2007-10-24" - "N"
"IPI00746388" - "2" - "2007-11-13" - "2009-02-12" - "N"
"IPI00746388" - "3" - "2009-03-03" - "2009-06-17" - "N"
viritha kaza wrote:
> Hi Samuel,
> These are some of the ids for which I didnot get.
> IPI00055954
> IPI00221338
> IPI00465149
> IPI00554793
> IPI00028262
> IPI00412977
> IPI00105532
> IPI00411514
> IPI00746388
> IPI00419266
>
> Thanks,
> Viritha
>
> On Fri, Feb 18, 2011 at 2:49 AM, Samuel GRANJEAUD - IR/ICIM
> <granjeau at tagc.univ-mrs.fr <mailto:granjeau at tagc.univ-mrs.fr>> wrote:
>
> Hi,
>
> Could you give us a list of 10 unmatched?
>
> BR
>
>
> viritha kaza wrote:
>
> Hi
> thanks for the reply:
> As samuel suggested I used the following link
> ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz.
> For the once I didnot find,I used the following code
> Though I still dont get 1:1 mapping, I got the entrez and the
> gene symbol.The ipi_test file contains the list of IPI that I
> want to convert.
> code:
> >source('http://bioconductor.org/biocLite.R')
> > biocLite("biomaRt")
> >library("biomaRt")
> >ensembl = useMart("ensembl", dataset =
> "hsapiens_gene_ensembl") >ipi=scan("ipi_test.txt",what
> =character(),sep='\n',quote="")
> >getBM(attributes =
> c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart
> = ensembl)
> >write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t')
> I am still not getting a few.Is there any other method or
> should I think that those IPI numbers dont have corresponding
> gene symbols?
> Thanks,
> Viritha
>
>
--
Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr
INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 11/24
http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01
http://icim.marseille.inserm.fr/proteomique
More information about the Bioconductor
mailing list