[Bioc-devel] IPI numbers in annotation packages

Marc Carlson mrjc42 at gmail.com
Mon Oct 5 16:16:31 CEST 2015


You need to scroll down that script a ways...  Look for 'yeast'.

On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Marc,
>
> That script has this in it:
>
> ## For now just get data for the ones that we have traditionally supported
> ## I don't even know if the other species are available...
> speciesList = c("chipsrc_human.sqlite",
>   "chipsrc_rat.sqlite",
>   "chipsrc_chicken.sqlite",
>   "chipsrc_zebrafish.sqlite",
>   #  "chipsrc_worm.sqlite",
>   #  "chipsrc_fly.sqlite",
>   "chipsrc_mouse.sqlite",
>   "chipsrc_bovine.sqlite"
>   #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
> "activated"
>   ## But to activate arabidopsis, remember you have to pre-add the
> tables...
>   #  "chipsrc_canine.sqlite",
>   #  "chipsrc_rhesus.sqlite",
>   #  "chipsrc_chimp.sqlite",
>   #  "chipsrc_anopheles.sqlite"
>   )
>
> And there is no mention of yeast anywhere. If I search all the scripts for
> say 'INSERT INTO pfam', I get
>
> custom_anno/script/bindb.sql
> 328:INSERT INTO pfam
>
> pfam/script/srcdb_pfam.sql
> 202:-- INSERT INTO pfamb
>
> organism_annotation/script/bindb_yeast.sql
> 441:-- INSERT INTO pfam
>
> yeast/script/bindb.sql
> 241:-- INSERT INTO pfam
>
> The first one is just doing all the metadata tables, and the other three
> are in code blocks that are commented out. Is it possible that you used a
> script that didn't make it into svn?
>
> Jim
>
>
>
> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrjc42 at gmail.com> wrote:
>
>> Hi Jim,
>>
>> You asked me on Friday where the PFAM Ids for yeast came from and I
>> couldn't recall because at the moment I was at Seattle Childrens (and thus
>> nowhere near my copy of my source code).  But I also said I would look into
>> it for you later (and I have).  Here is what my code tells me:  So ever
>> since IPI shut down, we have been getting the PFAM and IPI data from
>> UniProt.  There is a script in the UniProt.ws package
>> called processDataForBuild.R that is supposed to be called by the script
>> "src_build.sh" (it's the last thing that script does).  That code should
>> get the pfam data from yeast for you.  Please note that yeast required a
>> lot of special code to get it processed.  Nothing with yeast annotations is
>> ever easy.  It's like karmic accounting to compensate for all the bread and
>> beer.  ;)
>>
>> Let me know if you need any more explanations about what is in there.
>> Because of the crazy timing, before I left I build I pushed into devel a
>> fresh set of .DB0s and core packages (in late August) just in case it was
>> too crazy to do a refresh right now.  But it sounds like you won't need
>> that.
>>
>>
>>   Marc
>>
>>
>>
>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmacdon at uw.edu>
>> wrote:
>>
>>> I am building the annotation db0 packages for the upcoming Bioconductor
>>> release, which are used to generate all the orgDb and chip annotation
>>> packages that we distribute. Up to the previous release we have always
>>> included IPI identifiers (as part of the table containing the PROSITE and
>>> PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer
>>> maintained (since 2011), and UniProt, which is where we got data for the
>>> last few releases, has now dropped support as well.
>>>
>>> Given that this annotation source is no longer maintained, I decided to
>>> exclude these IDs from the current build of the following db0 packages:
>>>
>>>    - rat.db0
>>>    - chicken.db0
>>>    - zebrafish.db0
>>>    - mouse.db0
>>>    - bovine.db0
>>>    - human.db0
>>>
>>> In addition, it is not clear to me (nor can Marc recall) where the data
>>> for
>>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>>> behind schedule for these packages, I have excluded that table as well.
>>>
>>> If this will break anybody's package, or if there are people who rely on
>>> these IDs, I can just parse out of the last release and deprecate, so you
>>> will have the IDs for one more release. However, if nobody cares about
>>> such
>>> things, I will just go with what we have. Please speak up if this will
>>> affect you.
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list