[Bioc-devel] eset.Rnw revised in Biobase, please review

Kasper Daniel Hansen khansen at stat.Berkeley.EDU
Tue Sep 6 00:45:30 CEST 2005


Hi Vince and others

Below is my first thoughts about the eSet class. I must say that I  
like small "tight" classes with a strong validity checking.

I will start with some specific comments:

1) The history slot: a reasonable idea. But if we have a specific  
history slot, shouldn't it be filled automatically every time an eSet  
is created or modified. That is, every replacement function or  
initialization should update this slot. Otherwise I do not really see  
the need to keep this slot separate from the notes.

2) The dim method: since it is part of your validity checking that  
every component of the assayData slot has the same dimensions, there  
is no need to have the dim be a matrix (every column will by  
definition be the same). You need an internal method to extract the  
matrix of dimensions, in order to do the validity checking of course...

3) I like the idea of having reportNames separate from the assayData.  
That also means that the  names do not need to be unique. But shoudl  
sampleNames be a separate slot or just be the rownames of the  
phenoData slot? These should be some kind of checking that the length  
of these names or either 0 (no names given) or equal to the number of  
samples/reporters.

4) I think the class of reporterInfor (data.frameOrNULL) is a bit too  
strict. You give a compelling reason that we might want to give a  
control/active factor. Now, since the number of reporters are huge,  
this slot will (if not empty) be a very big structure, so I think we  
really want to allow a very specific usage of this kind of slot  
(data.frames are not terrible efficient). I would like the option of  
having it be either a factor, an integer or a matrix. A possible use  
scenario (which I strongly advocate) would be the use of an integer  
to indicate (x,y) position on the chip for AffyBatch-like objects  
(right now the map between row and (x,y) position in the AffyBatch  
object is implicit which does not allow for subsetting of the object,  
since that would break the link).

Also, if someone wants to do splitting or the assayData based on a  
factor, it may be _way_ more efficient to have the split done once  
and for all (I imagine assayDataControl, assayDataActive) (something  
which btw is not really doable in the current setup since the two  
structures would have different dimensions), instead of using a  
factor to the split "every time". Hmm. I haven't really thought this  
through.

5) I am not really in favour of the varMetadata slot of the phenoData  
class, although the vignette seems to indicate that this was included  
in Bioc 1.6. The only example you include is the specification of  
units, something I feel belong in the varLabels slot such as  
"specimen age, in years". As I currently understand it, I feel this  
is a bit too much annotation. The same goes for a hypothetical  
reporterMetadata slot. Perhaps you have another usage in mind? There  
does not seem to be validity checking of this slot?

6) the assayData slot: I do not really understand the pass-by- 
reference comments you make in the vignette, but they seem to  
indicate that there would be performance gains to using an  
environment. Could you explain this in some more detail. And if there  
is, I see no reason to allow a list type structure. I think it should  
be mandatory to have either a list or an environment, allowing both  
just adds confusion. I would rather have the community choose the  
most efficient way and then "force" developers to use this.

7) So the assayData slot does not have a specific number/names for  
its components. I see the need for this. But let us say I want to use  
it for a specific case where I have two assays (let us say a two- 
color micro array experiment). Do you imagine that people will create  
more specific versions of the class by something like (code not tested)
   setClass("twoclor", representation("eSet"),
      validity = function(object){
         if(!validObject(as(object, "eSet")
            return(FALSE)  ## this might be unnecessary
         if(sort(names(assayData(object)) != c("green", "red"))
            return(FALSE)
         else
           return(TRUE)
       })
or how do users actually make sure that the elements of the assayData  
have the relevant names (and numbers)?

Kasper


On Sep 2, 2005, at 9:26 AM, Vincent Carey 525-2265 wrote:

> We need discussion of the eSet class, which is to take the place
> of exprSet in the future.  eset.Rnw in Biobase/inst/doc has
> been revised.  Please review and discuss.
>
> you will need R 2.2 and the latest Biobase to build this vignette.
>
> vc
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list