[Bioc-devel] history mechanism

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Tue Sep 6 20:54:48 CEST 2005

> One thing I think we should not do, is to try and reimplement the
> general compendium concept here. If complete provenance from raw data
> file to finished analysis is what is wanted then the compendium concept
> provides a reasonable mechanism for doing that. Other things and
> improvements are possible to that, but I don't think it will be
> worthwhile to try and force this sort of generality into a history
> mechanism. For most (all really) data analysis that I do now, I use the
> compendium approach.

The reference to the compendium concept provided by Robert
is very close to my initial but unsent response to the idea of
detailed history management.  If you want a document about
a data analysis in R, build one using Sweave or some comparable
system.  Perhaps one could even stick that on to a data object.
But don't expect the software to do this for you any time soon.

Self-auditing objects are intriguing.  Descriptions of the initial
state and the subsequent update steps can become part of the object.  It
sounds like some progress is being made there.  Development in this
area is in my view orthogonal to what we are doing in Bioc; when
self-auditing objects are working well, perhaps that model can be
adopted for eSets.

>   However, eSets are some form of self-documenting data structure and it
> is worthwhile to improve that documentation.
>   Some general ideas for comment/consideration
>   1) it would be nice to capture some input into a history mechanism
> into every instance that documents what happened to it
>    2) some objects (such as eSet and exprSet) are compound - they have
> phenoData and exprs and other more complex objects are possible. I
> suspect that each of these needs to keep its own history
>    3) we could try to catch something from every call to a Replace
> method, but it is not always easy to know what. We could add an argument
> - and thereby "force" developers to pass the information down
>    One problem with this is that we want to support [[<-, [<- and $<-,
> and these already have well defined signatures that we cannot easily change.
>    We can try to capture the call that was made to the function that is
> changing the eSet (but how do we know that is the important one? If the
> developer uses helper functions we sometimes need to look further up the
> call stack. I know of no way to solve this generally).
> Proposal:
> --------
>    I suggest that we might add a history slot, which contains the
> history to each object that we want to collect history on. We allow for
> an optional history parameter for replacement functions and ask the
> developers to use this to tell us the important call, and if it is not
> present we can get a call, automatically (using sys.parent) from the
> calling function, it will not be perfect, but would give us a start.
>    I do not expect that we can ever replay the history (it would need
> much more information and I think we end up duplicating what is in the
> compendium concept).

Making it hard to have a completely non-auditable object is
appropriate, so a call record and a time stamp should
be mandatory at certain construction events.  How could this
be useful?  If an object is created by reading some CEL files
in a given directory, it should be possible to say when it
happened and what directory it was in.  Then there's some chance
of going back to the backup tapes to -- partially -- reconstruct.
Other filtering steps might not be reflected in the history.

I'm not too enthused about fiddling with replacement functions
but some experiments and use cases could change that.

PS I think getting people to understand and use the compendium concept
could be more valuable than the history slot.  Google on "bioconductor
compendium" to find the primary example and a paper at BEpress
on the topic.

More information about the Bioc-devel mailing list