[Bioc-devel] IPI numbers in annotation packages

James W. MacDonald jmacdon at uw.edu
Mon Oct 5 16:47:13 CEST 2015


Ah. That's the problem. The script in getdb.sh has

R --slave <
/home/ubuntu/cpb_anno/AnnotationBuildPipeline/annosrc/uniprot/script/
uniprot.ws/inst/script/processDataForBuild.R

which is a modification of what is in svn (to match the directory structure
of the AMI), which calls on a script in a local version of the UniProt.ws
package. The local version doesn't have any code for yeast, but the 'real'
version (UniProt.ws) does. I assumed the local version was special, and
that I should be using that because you were specifically using that one
rather than an actually installed package.

annosrc$ grep -i yeast uniprot/script/
uniprot.ws/inst/script/processDataForBuild.R
annosrc$
annosrc$ grep -i yeast
~/R/x86_64-pc-linux-gnu-library/3.2/UniProt.ws/script/processDataForBuild.R
## Now for special treatment for missing stuff from yeast.
getYeastData <- function(dbFile, db){
doYeastInserts <- function(db, table, data){
## just one more run through to just do what is needed to get pfam into
yeast.
species <- 'chipsrc_yeast.sqlite'
res <- getYeastData(species, db)
doYeastInserts(db, "pfam", res[["pfam"]])
doYeastInserts(db, "smart", res[["smart"]])


Thanks!

Jim



On Mon, Oct 5, 2015 at 10:16 AM, Marc Carlson <mrjc42 at gmail.com> wrote:

> You need to scroll down that script a ways...  Look for 'yeast'.
>
> On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> Hi Marc,
>>
>> That script has this in it:
>>
>> ## For now just get data for the ones that we have traditionally supported
>> ## I don't even know if the other species are available...
>> speciesList = c("chipsrc_human.sqlite",
>>   "chipsrc_rat.sqlite",
>>   "chipsrc_chicken.sqlite",
>>   "chipsrc_zebrafish.sqlite",
>>   #  "chipsrc_worm.sqlite",
>>   #  "chipsrc_fly.sqlite",
>>   "chipsrc_mouse.sqlite",
>>   "chipsrc_bovine.sqlite"
>>   #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
>> "activated"
>>   ## But to activate arabidopsis, remember you have to pre-add the
>> tables...
>>   #  "chipsrc_canine.sqlite",
>>   #  "chipsrc_rhesus.sqlite",
>>   #  "chipsrc_chimp.sqlite",
>>   #  "chipsrc_anopheles.sqlite"
>>   )
>>
>> And there is no mention of yeast anywhere. If I search all the scripts
>> for say 'INSERT INTO pfam', I get
>>
>> custom_anno/script/bindb.sql
>> 328:INSERT INTO pfam
>>
>> pfam/script/srcdb_pfam.sql
>> 202:-- INSERT INTO pfamb
>>
>> organism_annotation/script/bindb_yeast.sql
>> 441:-- INSERT INTO pfam
>>
>> yeast/script/bindb.sql
>> 241:-- INSERT INTO pfam
>>
>> The first one is just doing all the metadata tables, and the other three
>> are in code blocks that are commented out. Is it possible that you used a
>> script that didn't make it into svn?
>>
>> Jim
>>
>>
>>
>> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrjc42 at gmail.com> wrote:
>>
>>> Hi Jim,
>>>
>>> You asked me on Friday where the PFAM Ids for yeast came from and I
>>> couldn't recall because at the moment I was at Seattle Childrens (and thus
>>> nowhere near my copy of my source code).  But I also said I would look into
>>> it for you later (and I have).  Here is what my code tells me:  So ever
>>> since IPI shut down, we have been getting the PFAM and IPI data from
>>> UniProt.  There is a script in the UniProt.ws package
>>> called processDataForBuild.R that is supposed to be called by the script
>>> "src_build.sh" (it's the last thing that script does).  That code should
>>> get the pfam data from yeast for you.  Please note that yeast required a
>>> lot of special code to get it processed.  Nothing with yeast annotations is
>>> ever easy.  It's like karmic accounting to compensate for all the bread and
>>> beer.  ;)
>>>
>>> Let me know if you need any more explanations about what is in there.
>>> Because of the crazy timing, before I left I build I pushed into devel a
>>> fresh set of .DB0s and core packages (in late August) just in case it was
>>> too crazy to do a refresh right now.  But it sounds like you won't need
>>> that.
>>>
>>>
>>>   Marc
>>>
>>>
>>>
>>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmacdon at uw.edu>
>>> wrote:
>>>
>>>> I am building the annotation db0 packages for the upcoming Bioconductor
>>>> release, which are used to generate all the orgDb and chip annotation
>>>> packages that we distribute. Up to the previous release we have always
>>>> included IPI identifiers (as part of the table containing the PROSITE
>>>> and
>>>> PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer
>>>> maintained (since 2011), and UniProt, which is where we got data for the
>>>> last few releases, has now dropped support as well.
>>>>
>>>> Given that this annotation source is no longer maintained, I decided to
>>>> exclude these IDs from the current build of the following db0 packages:
>>>>
>>>>    - rat.db0
>>>>    - chicken.db0
>>>>    - zebrafish.db0
>>>>    - mouse.db0
>>>>    - bovine.db0
>>>>    - human.db0
>>>>
>>>> In addition, it is not clear to me (nor can Marc recall) where the data
>>>> for
>>>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>>>> behind schedule for these packages, I have excluded that table as well.
>>>>
>>>> If this will break anybody's package, or if there are people who rely on
>>>> these IDs, I can just parse out of the last release and deprecate, so
>>>> you
>>>> will have the IDs for one more release. However, if nobody cares about
>>>> such
>>>> things, I will just go with what we have. Please speak up if this will
>>>> affect you.
>>>>
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list