[BioC] [devteam-bioc] loading/accessing older GO.db and org.Hs.eg.db data
Marc Carlson
mcarlson at fhcrc.org
Sat May 31 00:09:34 CEST 2014
Hi Jonathon,
There are plenty of very good reasons why mixing different versions of
packages is a bad idea and for why it is preferable to use an older
version of R/Bioconductor if repeating an older analysis. For one thing
such combinations of old and new packages are untested and may produce
unpredictable results. For another if you are really interested in
reproducing older results, you should do so by trying to keep all those
variables the same as they were the 1st time.
But if you have read all about the risks and are still hell-bent on
doing it anyways the most direct approach would be to swap the database
files between the source tarballs.
You can find older versions of older packages at the links here (look
for the box labeled 'Previous Versions' in the lower right hand side of
the screen)
http://www.bioconductor.org/install/
And since the database schemas have not changed all *that* much you
might be in luck. That is for any pair of packages it is possibly the
case that you could take the .sqlite file from an older tarball and then
drop it into the inst/extdata directory of a newer source tarball. This
kind of hack could work assuming that the schema has not changed too
much. But if you go back too far, then you might have more and more
problems because of additions that were made to the metadata table etc.
I actually tried this for the bioc 2.10 release (putting it into the
2.14 release) and this kind of 'brain transplant' seemed to mostly work
OK (except that the GO queries were messed up - more on that below).
But this is still not recommended. Not only will you have missing data
etc. But there is data in the GO.db package and the org.Hs.eg.db
package that needs to line up (GOIDs). And without the assurance that
this data will match up: some functions that you want to use may simply
not work properly. Also if the schema has changed then you may find
that the swap I described above requires you to make modifications to
the table structure for the older DB. For example in the case I tested
above the GO terms would not work with the extractors. Why? Because
the newer DB adds several views to the data in order to get a
performance boost. If I really wanted this old data to work with my new
package software, I would have to also update its DB to contain those
newer views. You should be able to do that be just looking at the newer
DB and calling .schema on the relevant views.
Marc
On 05/29/2014 12:20 PM, Maintainer wrote:
> Hello,
>
> I am interested in accessing old versions of GO.db and org.Hs.eg.db data. I would like to them to potentially be different versions, so entirely downgrading bioconductor and/or R (the standard response) does not seem to make sense. Also, I recognize that identifiers and mappings may be missing or not match -- That is ok.
>
> After reading through the AnnotationDbi documentation, it seems to me that annotation packages come from SQLite databases. If so, is it possible to create new annotation objects using older SQLite databases? Are there archived SQLite databases of these data?
>
> If not, what is a reasonable method in bioconductor to get older versions of the data?
>
> There are a few posts here that seem to indicate that different versioning is not a good idea. But, any direction would be much appreciated!
>
> Thanks,
> Jonathan Mortensen
>
> PhD Candidate
> Stanford Center for Biomedical Informatics Research
> Stanford, CA
> jonathanmortensen.com
> 513-225-1935
>
> -- output of sessionInfo():
>
> N/A
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
More information about the Bioconductor
mailing list