[Bioc-devel] IPI numbers in annotation packages
James W. MacDonald
jmacdon at uw.edu
Mon Oct 5 16:47:13 CEST 2015
Ah. That's the problem. The script in getdb.sh has
R --slave <
/home/ubuntu/cpb_anno/AnnotationBuildPipeline/annosrc/uniprot/script/
uniprot.ws/inst/script/processDataForBuild.R
which is a modification of what is in svn (to match the directory structure
of the AMI), which calls on a script in a local version of the UniProt.ws
package. The local version doesn't have any code for yeast, but the 'real'
version (UniProt.ws) does. I assumed the local version was special, and
that I should be using that because you were specifically using that one
rather than an actually installed package.
annosrc$ grep -i yeast uniprot/script/
uniprot.ws/inst/script/processDataForBuild.R
annosrc$
annosrc$ grep -i yeast
~/R/x86_64-pc-linux-gnu-library/3.2/UniProt.ws/script/processDataForBuild.R
## Now for special treatment for missing stuff from yeast.
getYeastData <- function(dbFile, db){
doYeastInserts <- function(db, table, data){
## just one more run through to just do what is needed to get pfam into
yeast.
species <- 'chipsrc_yeast.sqlite'
res <- getYeastData(species, db)
doYeastInserts(db, "pfam", res[["pfam"]])
doYeastInserts(db, "smart", res[["smart"]])
Thanks!
Jim
On Mon, Oct 5, 2015 at 10:16 AM, Marc Carlson <mrjc42 at gmail.com> wrote:
> You need to scroll down that script a ways... Look for 'yeast'.
>
> On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> Hi Marc,
>>
>> That script has this in it:
>>
>> ## For now just get data for the ones that we have traditionally supported
>> ## I don't even know if the other species are available...
>> speciesList = c("chipsrc_human.sqlite",
>> "chipsrc_rat.sqlite",
>> "chipsrc_chicken.sqlite",
>> "chipsrc_zebrafish.sqlite",
>> # "chipsrc_worm.sqlite",
>> # "chipsrc_fly.sqlite",
>> "chipsrc_mouse.sqlite",
>> "chipsrc_bovine.sqlite"
>> # "chipsrc_arabidopsis.sqlite" ## this is available and could be
>> "activated"
>> ## But to activate arabidopsis, remember you have to pre-add the
>> tables...
>> # "chipsrc_canine.sqlite",
>> # "chipsrc_rhesus.sqlite",
>> # "chipsrc_chimp.sqlite",
>> # "chipsrc_anopheles.sqlite"
>> )
>>
>> And there is no mention of yeast anywhere. If I search all the scripts
>> for say 'INSERT INTO pfam', I get
>>
>> custom_anno/script/bindb.sql
>> 328:INSERT INTO pfam
>>
>> pfam/script/srcdb_pfam.sql
>> 202:-- INSERT INTO pfamb
>>
>> organism_annotation/script/bindb_yeast.sql
>> 441:-- INSERT INTO pfam
>>
>> yeast/script/bindb.sql
>> 241:-- INSERT INTO pfam
>>
>> The first one is just doing all the metadata tables, and the other three
>> are in code blocks that are commented out. Is it possible that you used a
>> script that didn't make it into svn?
>>
>> Jim
>>
>>
>>
>> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrjc42 at gmail.com> wrote:
>>
>>> Hi Jim,
>>>
>>> You asked me on Friday where the PFAM Ids for yeast came from and I
>>> couldn't recall because at the moment I was at Seattle Childrens (and thus
>>> nowhere near my copy of my source code). But I also said I would look into
>>> it for you later (and I have). Here is what my code tells me: So ever
>>> since IPI shut down, we have been getting the PFAM and IPI data from
>>> UniProt. There is a script in the UniProt.ws package
>>> called processDataForBuild.R that is supposed to be called by the script
>>> "src_build.sh" (it's the last thing that script does). That code should
>>> get the pfam data from yeast for you. Please note that yeast required a
>>> lot of special code to get it processed. Nothing with yeast annotations is
>>> ever easy. It's like karmic accounting to compensate for all the bread and
>>> beer. ;)
>>>
>>> Let me know if you need any more explanations about what is in there.
>>> Because of the crazy timing, before I left I build I pushed into devel a
>>> fresh set of .DB0s and core packages (in late August) just in case it was
>>> too crazy to do a refresh right now. But it sounds like you won't need
>>> that.
>>>
>>>
>>> Marc
>>>
>>>
>>>
>>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmacdon at uw.edu>
>>> wrote:
>>>
>>>> I am building the annotation db0 packages for the upcoming Bioconductor
>>>> release, which are used to generate all the orgDb and chip annotation
>>>> packages that we distribute. Up to the previous release we have always
>>>> included IPI identifiers (as part of the table containing the PROSITE
>>>> and
>>>> PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer
>>>> maintained (since 2011), and UniProt, which is where we got data for the
>>>> last few releases, has now dropped support as well.
>>>>
>>>> Given that this annotation source is no longer maintained, I decided to
>>>> exclude these IDs from the current build of the following db0 packages:
>>>>
>>>> - rat.db0
>>>> - chicken.db0
>>>> - zebrafish.db0
>>>> - mouse.db0
>>>> - bovine.db0
>>>> - human.db0
>>>>
>>>> In addition, it is not clear to me (nor can Marc recall) where the data
>>>> for
>>>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>>>> behind schedule for these packages, I have excluded that table as well.
>>>>
>>>> If this will break anybody's package, or if there are people who rely on
>>>> these IDs, I can just parse out of the last release and deprecate, so
>>>> you
>>>> will have the IDs for one more release. However, if nobody cares about
>>>> such
>>>> things, I will just go with what we have. Please speak up if this will
>>>> affect you.
>>>>
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> University of Washington
>>>> Environmental and Occupational Health Sciences
>>>> 4225 Roosevelt Way NE, # 100
>>>> Seattle WA 98105-6099
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list