[Bioc-devel] Creating an org.Hs.uniprot.db package

Karim Mezhoud kmezhoud at gmail.com
Wed Sep 13 15:18:02 CEST 2017


Hi,
Simply, for LCMSMS protein identification, it is better to use curated, and
verified protein sequence database (from experiments, assays and
publications).
Uniprot KnowledgeBase is known to be the most curated one (even manually)
It will be possible to convert ans improve  uniprotKB (Hs) to
org.Hs.uniprot.db if not yet done.
Karim



On Wed, Sep 13, 2017 at 12:00 PM, Aditya Bhagwat <
adb2018 at qatar-med.cornell.edu> wrote:

> Hey guys,
>
>
>
> Thanks for your responses!
>
> Exactly the kind of feedback I wanted, to ensure that what is being
> intended makes sense.
>
>
>
> Resource to be used is UniprotKB.  Preview the first 10 entries:
>
> http://www.uniprot.org/uniprot/?sort=&desc=&compress=
> no&query=&fil=organism:%22Homo%20sapiens%20(Human)%20[
> 9606]%22&limit=10&force=no&preview=true&format=txt
> <http://www.uniprot.org/uniprot/?sort=&desc=&compress=no&query=&fil=organism:%22Homo%20sapiens%20(Human)%20%5b9606%5d%22&limit=10&force=no&preview=true&format=txt>
>
>
>
> How it will look like? Similar to org.Hs.eg.db, using the
> AnnotationDbi::select interface etc.
>
>
>
> Why not org.Hs.eg.db?
>
> ·         Many uniprot accessions are simply not present in org.Hs.eg.db.
> Take non-canonical isoforms. They are badly represented in org.Hs.eg.db,
> but are essential in LCMS proteomics.
>
> ·         org.Hs.eg.db does not offer an easy way to map uniprot
> accessions to uniprot summary. I would include that in the
> org.Hs.uniprot.db package
>
> ·         In LCMS proteomics, protein identification is performed by
> comparing the observed MS spectra to those you would expect from an
> organism-specific protein sequence database. Using the same protein
> sequence database for annotation as is being used for identification would
> provide a one-to-one mapping between analysis and annotation..
>
>
>
> Why not biomaRt
>
> ·         A reasonably deep LCMS proteomics experiment quantifies 7000
> proteins. Retrieving annotation for these through biomaRt would be slow (an
> overnight activity). I want something that works instantaneously.
>
> ·         From what I remember you can actually not access the uniprot
> summary (which gives a paragraph on current known knowledge for a protein)
> field from within biomaRt.
>
>
>
> What do you guys think?
>
> Thanks for your feedback!
>
>
>
> Cheers,
>
>
>
> Aditya
>
>
>
>
>
> *From:* Karim Mezhoud [mailto:kmezhoud at gmail.com]
> *Sent:* Wednesday, September 13, 2017 1:35 PM
> *To:* Vincent Carey
> *Cc:* Aditya Bhagwat; bioc-devel at r-project.org
> *Subject:* Re: [Bioc-devel] Creating an org.Hs.uniprot.db package
>
>
>
> Hi,
>
> I general LCMSMS generate mass/charge data of Amino Acid or peptides.
>
> The goal in to identify which  protein belong the peptides.
>
> The Software used with LCMSMS can match the peptides to Uniprot database ,
> and ranks putative proteins by scores.
>
> Could the tell us what is the interest of org.Hs.uniprot.db versus default
> UniprotKB?
>
> Thanks,
>
> Karim
>
>
>
>
>
> On Wed, Sep 13, 2017 at 11:19 AM, Vincent Carey <
> stvjc at channing.harvard.edu> wrote:
>
> can you say a little more about what resource will be tapped and what it
> will look like?  you can
> already use uniprot identifiers as keys into org.Hs.eg.db
>
> On Tue, Sep 12, 2017 at 9:05 AM, Aditya Bhagwat <
> adb2018 at qatar-med.cornell.edu> wrote:
>
> > Hey guys,
> >
> > I love the org.Hs.eg.db package (and similar others for other organisms).
> >
> > I work a lot with LCMSMS proteomics data, and I have always missed a
> > similar org.Hs.uniprot.db package, so I am thinking of creating that (and
> > then sharing it on BioC, to benefit fellow proteomics R users).
> >
> > Would you agree that this is of general interest? (Or does this in some
> > form already exist and have I overlooked it?)
> >
> > Thanks for your feedback!
> >
> > Cheers,
> >
> > Aditya
> >
> >
> >
> > Disclaimer: This email and its attachments may be confidential and are
> > intended solely for the use of the individual to whom it is addressed. If
> > you are not the intended recipient, any reading, printing, storage,
> > disclosure, copying or any other action taken in respect of this e-mail
> is
> > prohibited and may be unlawful. If you are not the intended recipient,
> > please notify the sender immediately by using the reply function and then
> > permanently delete what you have received..
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> ------------------------------
> Disclaimer: This email and its attachments may be confidential and are
> intended solely for the use of the individual to whom it is addressed. If
> you are not the intended recipient, any reading, printing, storage,
> disclosure, copying or any other action taken in respect of this e-mail is
> prohibited and may be unlawful. If you are not the intended recipient,
> please notify the sender immediately by using the reply function and then
> permanently delete what you have received..
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list