[Bioc-devel] eset.Rnw revised in Biobase, please review
Kasper Daniel Hansen
khansen at stat.Berkeley.EDU
Tue Sep 6 00:45:30 CEST 2005
Hi Vince and others
Below is my first thoughts about the eSet class. I must say that I
like small "tight" classes with a strong validity checking.
I will start with some specific comments:
1) The history slot: a reasonable idea. But if we have a specific
history slot, shouldn't it be filled automatically every time an eSet
is created or modified. That is, every replacement function or
initialization should update this slot. Otherwise I do not really see
the need to keep this slot separate from the notes.
2) The dim method: since it is part of your validity checking that
every component of the assayData slot has the same dimensions, there
is no need to have the dim be a matrix (every column will by
definition be the same). You need an internal method to extract the
matrix of dimensions, in order to do the validity checking of course...
3) I like the idea of having reportNames separate from the assayData.
That also means that the names do not need to be unique. But shoudl
sampleNames be a separate slot or just be the rownames of the
phenoData slot? These should be some kind of checking that the length
of these names or either 0 (no names given) or equal to the number of
samples/reporters.
4) I think the class of reporterInfor (data.frameOrNULL) is a bit too
strict. You give a compelling reason that we might want to give a
control/active factor. Now, since the number of reporters are huge,
this slot will (if not empty) be a very big structure, so I think we
really want to allow a very specific usage of this kind of slot
(data.frames are not terrible efficient). I would like the option of
having it be either a factor, an integer or a matrix. A possible use
scenario (which I strongly advocate) would be the use of an integer
to indicate (x,y) position on the chip for AffyBatch-like objects
(right now the map between row and (x,y) position in the AffyBatch
object is implicit which does not allow for subsetting of the object,
since that would break the link).
Also, if someone wants to do splitting or the assayData based on a
factor, it may be _way_ more efficient to have the split done once
and for all (I imagine assayDataControl, assayDataActive) (something
which btw is not really doable in the current setup since the two
structures would have different dimensions), instead of using a
factor to the split "every time". Hmm. I haven't really thought this
through.
5) I am not really in favour of the varMetadata slot of the phenoData
class, although the vignette seems to indicate that this was included
in Bioc 1.6. The only example you include is the specification of
units, something I feel belong in the varLabels slot such as
"specimen age, in years". As I currently understand it, I feel this
is a bit too much annotation. The same goes for a hypothetical
reporterMetadata slot. Perhaps you have another usage in mind? There
does not seem to be validity checking of this slot?
6) the assayData slot: I do not really understand the pass-by-
reference comments you make in the vignette, but they seem to
indicate that there would be performance gains to using an
environment. Could you explain this in some more detail. And if there
is, I see no reason to allow a list type structure. I think it should
be mandatory to have either a list or an environment, allowing both
just adds confusion. I would rather have the community choose the
most efficient way and then "force" developers to use this.
7) So the assayData slot does not have a specific number/names for
its components. I see the need for this. But let us say I want to use
it for a specific case where I have two assays (let us say a two-
color micro array experiment). Do you imagine that people will create
more specific versions of the class by something like (code not tested)
setClass("twoclor", representation("eSet"),
validity = function(object){
if(!validObject(as(object, "eSet")
return(FALSE) ## this might be unnecessary
if(sort(names(assayData(object)) != c("green", "red"))
return(FALSE)
else
return(TRUE)
})
or how do users actually make sure that the elements of the assayData
have the relevant names (and numbers)?
Kasper
On Sep 2, 2005, at 9:26 AM, Vincent Carey 525-2265 wrote:
> We need discussion of the eSet class, which is to take the place
> of exprSet in the future. eset.Rnw in Biobase/inst/doc has
> been revised. Please review and discuss.
>
> you will need R 2.2 and the latest Biobase to build this vignette.
>
> vc
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
More information about the Bioc-devel
mailing list