[BioC] Protein/peptide mass
Thomas Girke
thomas.girke at ucr.edu
Thu May 25 15:48:50 CEST 2006
John,
Here is how I usually obtain MW info for many input files using pepstats
in a shell for loop:
for i in *.fasta; do pepstats -sequence $i -stdout -auto >> pepstats; done
The argument '-stdout' turns off EMBOSS's interactive mode.
If your peptides are in a fasta batch file then you can split them with
'seqret' using the argument '-ossingle'.
I am not sure how accurate pepstats calcultates MWs.
Thomas
On Thu 05/25/06 09:07, john seers (IFR) wrote:
>
>
> Hi Thomas
>
> Thank you very much for your reply.
>
> There are some functions in the packages "seqinr" and "Biostrings", in
> fact quite a lot, but not one to calculate the mass of a peptide that I
> can find. So I was being forced down the route of having to call an
> EMBOSS program and parse the results. The problem with that is the
> interface is not easy - often needs a file as input in some standard
> format - not just passing in a string on the command line.
>
> The other way I thought might be possible was to use the online
> facilities of something like Expasy's "PeptideMass" but I cannot get
> that to work. Does anybody have any idea if that is possible?
>
> Regards
>
> John Seers
>
>
>
>
>
>
> ---
>
> John Seers
> Institute of Food Research
> Norwich Research Park
> Colney
> Norwich
> NR4 7UA
>
>
> tel +44 (0)1603 251490
> fax +44 (0)1603 255167
> e-mail john.seers at bbsrc.ac.uk
> e-disclaimer at http://www.ifr.ac.uk/edisclaimer/
>
> Web sites:
>
> www.ifr.ac.uk
> www.foodandhealthnetwork.com
>
>
> -----Original Message-----
> From: Thomas Girke [mailto:thomas.girke at ucr.edu]
> Sent: 24 May 2006 18:35
> To: john seers (IFR)
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Protein/peptide mass
>
>
> John,
> Allow me to post some comments to your question rather than providing an
> immediate
> answer.
>
> On UNIX-type OSs, like Linux or MacOSX, I usually run EMBOSS
> command-line programs directly from R using the
> systems("myemboss_program")
> command and slurp the results into R data frames with its standard data
> import
> functions (e.g. read.table, read.Lines). The import step often requires
> some knowledge
> about R's regular expression utilities for reformatting the results as
> needed.
> Knowledge about BioPerl is often very helpful as well. The advantage of
> this
> approach is that one can post-analyze and plot almost any type of bio-
> or
> drug-informatics program in R. However, to do this one needs to have
> some
> basic knowledge of R, mostly for the import step of very variable data
> structures.
>
> For the future it would be very useful to have some BioC utilities that
> will allow
> a more user-friendly data import from EMBOSS, BLAST and hundreds of
> other
> non-R-based bioinformatics programs.
>
> I would be interested to know whether members on this list are working
> on packages
> that will facilitate this integration with external sequence analysis
> tools?
>
> Thomas
>
>
> On Wed 05/24/06 16:31, john seers (IFR) wrote:
> > Hello All
> >
> > Apologies in advance if this is an obvious question but I have
> searched
> > and cannot find an answer or a straightforward way to do it.
> >
> > Is there a way to calculate the mass of a protein/peptide using
> > R/Bioconductor? i.e. like the Expasy "PeptideMass" web page or like
> the
> > EMBOSS pepstats?
> >
> > Regards
> >
> > John Seers
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
> --
> Thomas Girke, Ph.D.
> 1008 Noel T. Keen Hall
> Center for Plant Cell Biology (CEPCEB)
> University of California
> Riverside, CA 92521
>
> E-mail: thomas.girke at ucr.edu
> Website: http://faculty.ucr.edu/~tgirke
> Ph: 951-827-2469
> Fax: 951-827-4437
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Thomas Girke, Ph.D.
1008 Noel T. Keen Hall
Center for Plant Cell Biology (CEPCEB)
University of California
Riverside, CA 92521
E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437
More information about the Bioconductor
mailing list