[Bioc-devel] Biobase / eSet changes for this release
Rafael A. Irizarry
ririzarr at jhsph.edu
Wed Apr 19 01:02:18 CEST 2006
Martin and Seth,
Thanks for this!
Regarding SnpSet, all applications ive seen are either genotype calling,
copy number estimation or both. so i do think that the "minimum" SnpSet
should contain 4 matrices. As you point out, we can always leave some
of the matrices empty. We could also define: subclasses, e.g.,
GenoTypeSnpSet and CopyNumberSnpSet.
Once you are done let us know and we will give you feedback within a day.
-r
On Tue, 18 Apr 2006, Martin Morgan wrote:
> I'll implement EmptyMatrix. I will then update validity checking (and
> other methods, as necessary) to allow for it in place of any/all
> elements in a class. This will take a couple of days to get around to,
> and in the mean time I'll talk with Robert and Seth about what
> elements are actually in SnpSet; in some ways the EmptyMatrix idea
> makes a more flexible approach (specifying several required slots,
> e.g., expression, call, copyNumber, but usually functioning correctly
> if these are occupied by EmptyMatrix) appealing.
>
> It might help for a two-second overview of how 'expression' data might
> be generally useful; I can see how call and copyNumber will feed into
> 'downstream' analyses.
>
> Martin
>
> Seth Falcon <sfalcon at fhcrc.org> writes:
>
>> Hi Rafa,
>>
>> I'm going to answer out of order...
>>
>> Rafael A Irizarry <ririzarr at jhsph.edu> writes:
>>> 2) It seems that in general we will be storing an esimate (expression,
>>> calls, copynumber) and some kind of measure of uncertainty (SE for
>>> expression, p-value fo calls, etc..). However, a big chunk fo the apps
>>> will not use uncertainty. It would be a shame to have to store a matrix
>>> of NAs every single time. How hard would be to have eSet take NULL for
>>> some matrices? The validity check can look at everything that is not NULL.
>>> Notice that the alternative is to define a new class which means, in my
>>> case, Ill need two classes for every class Im defining or having a
>>> matrix of NAs, which, given the size of data these days, will be very
>>> inneficient.
>>
>> I think we should define an EmptyMatrix class:
>>
>> setClass("EmptyMatrix", contains="matrix")
>>
>> Why? Because:
>>
>> * new("EmptyMatrix") is small (no elements)
>> * We can dispatch on it. I think we might be able to get some
>> propagation of empty similar to how NA works. This can keep the
>> code from having to do lots of explicit type checking.
>> * NULL can happen by accident and should be an error. EmptyMatrix
>> won't just "appear" from a calculation.
>> * It will also mean that these values could go in a proper slot of
>> type "matrix" without having to create the matrixOrNull class
>> union.
>>
>>> 1) SNPset as it is will not be useful. As far as I know only Rob and me
>>> are developing software that will use snpSet. Both of us need a slot
>>> for copynumber. Otherwise we will need to create a new class, which will
>>> be the only one used, and we wont get to use the name SnpSet.
>>
>> There is something to be said for reserving the general class names
>> for things that will be general. Perhaps it makes sense to choose a
>> slightly more specific name while sorting out the all-purpose use
>> cases.
>>
>> Your RafaSnpSet can become SnpSet. But if you start with SnpSet and
>> go off in a direction that is not useful to others, then there is no
>> easy fix.
>>
>>
>>
>> + seth
>
More information about the Bioc-devel
mailing list