[Bioc-devel] Biobase / eSet changes for this release

Rafael A. Irizarry ririzarr at jhsph.edu
Wed Apr 19 01:02:18 CEST 2006


Martin and Seth,

Thanks for this!

Regarding SnpSet, all applications ive seen are either genotype calling, 
copy number estimation or both. so i do think that the "minimum" SnpSet 
should contain 4 matrices. As you point out, we can always leave some 
of the matrices empty. We could also define: subclasses, e.g., 
GenoTypeSnpSet and CopyNumberSnpSet.

Once you are done let us know and we will give you feedback within a day.

-r

On Tue, 18 Apr 2006, Martin Morgan wrote:

> I'll implement EmptyMatrix. I will then update validity checking (and
> other methods, as necessary) to allow for it in place of any/all
> elements in a class. This will take a couple of days to get around to,
> and in the mean time I'll talk with Robert and Seth about what
> elements are actually in SnpSet; in some ways the EmptyMatrix idea
> makes a more flexible approach (specifying several required slots,
> e.g., expression, call, copyNumber, but usually functioning correctly
> if these are occupied by EmptyMatrix) appealing.
>
> It might help for a two-second overview of how 'expression' data might
> be generally useful; I can see how call and copyNumber will feed into
> 'downstream' analyses.
>
> Martin
>
> Seth Falcon <sfalcon at fhcrc.org> writes:
>
>> Hi Rafa,
>>
>> I'm going to answer out of order...
>>
>> Rafael A Irizarry <ririzarr at jhsph.edu> writes:
>>> 2) It seems that in general we will be storing  an esimate (expression,
>>> calls, copynumber) and some kind of measure of uncertainty (SE for
>>> expression, p-value fo calls, etc..). However, a big chunk fo the apps
>>> will not use uncertainty. It would be a shame to have to store a matrix
>>> of NAs every single time. How hard would be to have eSet take NULL for
>>> some matrices? The validity check can look at everything that is not NULL.
>>> Notice that the alternative is to define  a new class which means, in my
>>> case, Ill need two classes for every class Im defining or having a
>>> matrix of NAs, which, given the size of data these days, will be very
>>> inneficient.
>>
>> I think we should define an EmptyMatrix class:
>>
>>     setClass("EmptyMatrix", contains="matrix")
>>
>> Why?  Because:
>>
>>  * new("EmptyMatrix") is small (no elements)
>>  * We can dispatch on it.  I think we might be able to get some
>>    propagation of empty similar to how NA works.  This can keep the
>>    code from having to do lots of explicit type checking.
>>  * NULL can happen by accident and should be an error.  EmptyMatrix
>>    won't just "appear" from a calculation.
>>  * It will also mean that these values could go in a proper slot of
>>    type "matrix" without having to create the matrixOrNull class
>>    union.
>>
>>> 1) SNPset as it is will not be useful. As far as I know only Rob and me
>>> are  developing software that will use snpSet. Both of us need a slot
>>> for copynumber. Otherwise we will need to create a new class, which will
>>> be the only one used, and we wont get to use the name SnpSet.
>>
>> There is something to be said for reserving the general class names
>> for things that will be general.  Perhaps it makes sense to choose a
>> slightly more specific name while sorting out the all-purpose use
>> cases.
>>
>> Your RafaSnpSet can become SnpSet.  But if you start with SnpSet and
>> go off in a direction that is not useful to others, then there is no
>> easy fix.
>>
>>
>>
>> + seth
>



More information about the Bioc-devel mailing list