[Bioc-devel] IPI numbers in annotation packages

James W. MacDonald jmacdon at uw.edu
Mon Oct 5 15:11:59 CEST 2015


Hi Marc,

That script has this in it:

## For now just get data for the ones that we have traditionally supported
## I don't even know if the other species are available...
speciesList = c("chipsrc_human.sqlite",
  "chipsrc_rat.sqlite",
  "chipsrc_chicken.sqlite",
  "chipsrc_zebrafish.sqlite",
  #  "chipsrc_worm.sqlite",
  #  "chipsrc_fly.sqlite",
  "chipsrc_mouse.sqlite",
  "chipsrc_bovine.sqlite"
  #  "chipsrc_arabidopsis.sqlite"  ## this is available and could be
"activated"
  ## But to activate arabidopsis, remember you have to pre-add the tables...
  #  "chipsrc_canine.sqlite",
  #  "chipsrc_rhesus.sqlite",
  #  "chipsrc_chimp.sqlite",
  #  "chipsrc_anopheles.sqlite"
  )

And there is no mention of yeast anywhere. If I search all the scripts for
say 'INSERT INTO pfam', I get

custom_anno/script/bindb.sql
328:INSERT INTO pfam

pfam/script/srcdb_pfam.sql
202:-- INSERT INTO pfamb

organism_annotation/script/bindb_yeast.sql
441:-- INSERT INTO pfam

yeast/script/bindb.sql
241:-- INSERT INTO pfam

The first one is just doing all the metadata tables, and the other three
are in code blocks that are commented out. Is it possible that you used a
script that didn't make it into svn?

Jim



On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrjc42 at gmail.com> wrote:

> Hi Jim,
>
> You asked me on Friday where the PFAM Ids for yeast came from and I
> couldn't recall because at the moment I was at Seattle Childrens (and thus
> nowhere near my copy of my source code).  But I also said I would look into
> it for you later (and I have).  Here is what my code tells me:  So ever
> since IPI shut down, we have been getting the PFAM and IPI data from
> UniProt.  There is a script in the UniProt.ws package
> called processDataForBuild.R that is supposed to be called by the script
> "src_build.sh" (it's the last thing that script does).  That code should
> get the pfam data from yeast for you.  Please note that yeast required a
> lot of special code to get it processed.  Nothing with yeast annotations is
> ever easy.  It's like karmic accounting to compensate for all the bread and
> beer.  ;)
>
> Let me know if you need any more explanations about what is in there.
> Because of the crazy timing, before I left I build I pushed into devel a
> fresh set of .DB0s and core packages (in late August) just in case it was
> too crazy to do a refresh right now.  But it sounds like you won't need
> that.
>
>
>   Marc
>
>
>
> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> I am building the annotation db0 packages for the upcoming Bioconductor
>> release, which are used to generate all the orgDb and chip annotation
>> packages that we distribute. Up to the previous release we have always
>> included IPI identifiers (as part of the table containing the PROSITE and
>> PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer
>> maintained (since 2011), and UniProt, which is where we got data for the
>> last few releases, has now dropped support as well.
>>
>> Given that this annotation source is no longer maintained, I decided to
>> exclude these IDs from the current build of the following db0 packages:
>>
>>    - rat.db0
>>    - chicken.db0
>>    - zebrafish.db0
>>    - mouse.db0
>>    - bovine.db0
>>    - human.db0
>>
>> In addition, it is not clear to me (nor can Marc recall) where the data
>> for
>> PFAM in the yeast.db0 package comes from. Given that we are pretty far
>> behind schedule for these packages, I have excluded that table as well.
>>
>> If this will break anybody's package, or if there are people who rely on
>> these IDs, I can just parse out of the last release and deprecate, so you
>> will have the IDs for one more release. However, if nobody cares about
>> such
>> things, I will just go with what we have. Please speak up if this will
>> affect you.
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list