[Bioc-devel] Creating an org.Hs.uniprot.db package

Aditya Bhagwat adb2018 at qatar-med.cornell.edu
Wed Sep 13 13:00:54 CEST 2017


Hey guys,

Thanks for your responses!
Exactly the kind of feedback I wanted, to ensure that what is being intended makes sense.

Resource to be used is UniprotKB.  Preview the first 10 entries:
http://www.uniprot.org/uniprot/?sort=&desc=&compress=no&query=&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22&limit=10&force=no&preview=true&format=txt<http://www.uniprot.org/uniprot/?sort=&desc=&compress=no&query=&fil=organism:%22Homo%20sapiens%20(Human)%20%5b9606%5d%22&limit=10&force=no&preview=true&format=txt>

How it will look like? Similar to org.Hs.eg.db, using the AnnotationDbi::select interface etc.

Why not org.Hs.eg.db?

·         Many uniprot accessions are simply not present in org.Hs.eg.db. Take non-canonical isoforms. They are badly represented in org.Hs.eg.db, but are essential in LCMS proteomics.

·         org.Hs.eg.db does not offer an easy way to map uniprot accessions to uniprot summary. I would include that in the org.Hs.uniprot.db package

·         In LCMS proteomics, protein identification is performed by comparing the observed MS spectra to those you would expect from an organism-specific protein sequence database. Using the same protein sequence database for annotation as is being used for identification would provide a one-to-one mapping between analysis and annotation..

Why not biomaRt

·         A reasonably deep LCMS proteomics experiment quantifies 7000 proteins. Retrieving annotation for these through biomaRt would be slow (an overnight activity). I want something that works instantaneously.

·         From what I remember you can actually not access the uniprot summary (which gives a paragraph on current known knowledge for a protein) field from within biomaRt.

What do you guys think?
Thanks for your feedback!

Cheers,

Aditya


From: Karim Mezhoud [mailto:kmezhoud at gmail.com]
Sent: Wednesday, September 13, 2017 1:35 PM
To: Vincent Carey
Cc: Aditya Bhagwat; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Creating an org.Hs.uniprot.db package

Hi,
I general LCMSMS generate mass/charge data of Amino Acid or peptides.
The goal in to identify which  protein belong the peptides.
The Software used with LCMSMS can match the peptides to Uniprot database , and ranks putative proteins by scores.
Could the tell us what is the interest of org.Hs.uniprot.db versus default UniprotKB?
Thanks,
Karim


On Wed, Sep 13, 2017 at 11:19 AM, Vincent Carey <stvjc at channing.harvard.edu<mailto:stvjc at channing.harvard.edu>> wrote:
can you say a little more about what resource will be tapped and what it
will look like?  you can
already use uniprot identifiers as keys into org.Hs.eg.db

On Tue, Sep 12, 2017 at 9:05 AM, Aditya Bhagwat <
adb2018 at qatar-med.cornell.edu<mailto:adb2018 at qatar-med.cornell.edu>> wrote:

> Hey guys,
>
> I love the org.Hs.eg.db package (and similar others for other organisms).
>
> I work a lot with LCMSMS proteomics data, and I have always missed a
> similar org.Hs.uniprot.db package, so I am thinking of creating that (and
> then sharing it on BioC, to benefit fellow proteomics R users).
>
> Would you agree that this is of general interest? (Or does this in some
> form already exist and have I overlooked it?)
>
> Thanks for your feedback!
>
> Cheers,
>
> Aditya
>
>
>
> Disclaimer: This email and its attachments may be confidential and are
> intended solely for the use of the individual to whom it is addressed. If
> you are not the intended recipient, any reading, printing, storage,
> disclosure, copying or any other action taken in respect of this e-mail is
> prohibited and may be unlawful. If you are not the intended recipient,
> please notify the sender immediately by using the reply function and then
> permanently delete what you have received..
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received..

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list