[BioC] Retrieve aminoacid sequence starting from protein identifier

john seers (IFR) john.seers at bbsrc.ac.uk
Wed Jun 24 16:09:42 CEST 2009


Hi Giulio

Have a look at getSEQ and getGI in the annotate package. I think they do what you want.

These two lines were lifted out of one of them I think:



> accession="P10451"
> seq <- readLines(paste("http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi?", "cmd=&txt=on&save=&cfm=&list_uids=", accession, "&", "db=nucleotide&extrafeat=16&term=&view=fasta&", "dispmax=20&SendTo=t&__from=&__to=&__strand=", sep = ""))
>     sequence<-paste(seq[2:length(seq)], sep = "", collapse = "")
> 
> 
> seq
[1] ">gi|129260|sp|P10451.1|OSTP_HUMAN RecName: Full=Osteopontin; AltName: Full=Bone sialoprotein 1; AltName: Full=Secreted phosphoprotein 1; Short=SPP-1; AltName: Full=Urinary stone protein; AltName: Full=Nephropontin; AltName: Full=Uropontin; Flags: Precursor"
[2] "MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQNAVSSEETNDFK"                                                                                                                                                                                          
[3] "QETLPSKSNESHDHMDDMDDEDDDDHVDSQDSIDSNDSDDVDDTDDSHQSDESHHSDESDELVTDFPTDL"                                                                                                                                                                                          
[4] "PATEVFTPVVPTVDTYDGRGDSVVYGLRSKSKKFRRPDIQYPDATDEDITSHMESEELNGAYKAIPVAQD"                                                                                                                                                                                          
[5] "LNAPSDWDSRGKDSYETSQLDDQSAETHSHKQSRLYKRKANDESNEHSDVIDSQELSKVSREFHSHEFHS"                                                                                                                                                                                          
[6] "HEDMLVVDPKSKEEDKHLKFRISHELDSASSEVN"                                                                                                                                                                                                                              
[7] ""                                                                                                                                                                                                                                                                
>


Regards


John




-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Giulio Di Giovanni
Sent: 24 June 2009 14:43
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Retrieve aminoacid sequence starting from protein identifier


 

Hi all,

 

I've looked through the archive, with no result. But I don't know, maybe it's a too easy question....

Anyway, I'd like to know if it exist a command or a package that can help me to retrieve the aminoacid sequence starting from the protein identifier, through a link with Uniprot or similar:

 

Let's say I have :

 

P10451 , Human

 

I would like to obtain: 

 

MRIAVICFCL LGITCAIPVK QADSGSSEEK QLYNKYPDAV ATWLNPDPSQ KQNLLAPQNA VSSEETNDFK QETLPSKSNE SHDHMDDMDD EDDDDHVDSQ DSIDSNDSDD VDDTDDSHQS DESHHSDESD ELVTDFPTDL PATEVFTPVV PTVDTYDGRG DSVVYGLRSK SKKFRRPDIQ YPDATDEDIT SHMESEELNG AYKAIPVAQD LNAPSDWDSR GKDSYETSQL DDQSAETHSH KQSRLYKRKA NDESNEHSDV IDSQELSKVS REFHSHEFHS HEDMLVVDPK SKEEDKHLKF RISHELDSAS SEVN 

 

Thanks in advance,

 

Giulio.

_________________________________________________________________
Naviga più semplice, più veloce e più sicuro. Scarica Internet Explor[[elided Hotmail spam]]  http://cid-16be95750dd16d04.skydrive.live.com/self.aspx/le%20PV%20in%20viaggio!/89.JPG
	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list