[Bioc-devel] history mechanism

Robert Gentleman rgentlem at fhcrc.org
Tue Sep 6 20:15:37 CEST 2005


Hi,

   Thanks to Kevin, Vince, Kasper, and others for raising this issue and 
for some useful ideas.

One thing I think we should not do, is to try and reimplement the 
general compendium concept here. If complete provenance from raw data 
file to finished analysis is what is wanted then the compendium concept 
provides a reasonable mechanism for doing that. Other things and 
improvements are possible to that, but I don't think it will be 
worthwhile to try and force this sort of generality into a history 
mechanism. For most (all really) data analysis that I do now, I use the 
compendium approach.

  However, eSets are some form of self-documenting data structure and it 
is worthwhile to improve that documentation.


  Some general ideas for comment/consideration

  1) it would be nice to capture some input into a history mechanism 
into every instance that documents what happened to it

   2) some objects (such as eSet and exprSet) are compound - they have 
phenoData and exprs and other more complex objects are possible. I 
suspect that each of these needs to keep its own history

   3) we could try to catch something from every call to a Replace 
method, but it is not always easy to know what. We could add an argument 
- and thereby "force" developers to pass the information down
   One problem with this is that we want to support [[<-, [<- and $<-, 
and these already have well defined signatures that we cannot easily change.

   We can try to capture the call that was made to the function that is 
changing the eSet (but how do we know that is the important one? If the 
developer uses helper functions we sometimes need to look further up the 
call stack. I know of no way to solve this generally).

Proposal:
--------

   I suggest that we might add a history slot, which contains the 
history to each object that we want to collect history on. We allow for 
an optional history parameter for replacement functions and ask the 
developers to use this to tell us the important call, and if it is not 
present we can get a call, automatically (using sys.parent) from the 
calling function, it will not be perfect, but would give us a start.

   I do not expect that we can ever replay the history (it would need 
much more information and I think we end up duplicating what is in the 
compendium concept).

   Any comments - recommendations, other improvements etc.?

  Robert



-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 981029-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioc-devel mailing list