[Bioc-devel] SummarizedExperiments

Martin Morgan mtmorgan at fhcrc.org
Fri Sep 14 23:19:16 CEST 2012


On 09/14/2012 11:46 AM, Kasper Daniel Hansen wrote:
> On Fri, Sep 14, 2012 at 2:25 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>> For what it's worth I already wrote a CombineSEwithNAs() function to do this
>> on the disjoint ranges for RRBS.  It assumes that there isn't any additional
>> colData or elementMetadata of interest (for reasons that will become clear)
>> and further assumes that the user will want to smooth.
>
> I have one in bsseq as well (as I said earlier), but this would still
> be nice to think about in the most general case possible.
>
>> seqlevels, seqlengths, genome are implemented via seqinfo as of the latest
>> GenomicRanges package (went in some time ago, thanks to MM):
>
> There is something I am missing here.  It clearly works.  But
> showMethods("genome") tells me that methods are defined for Any,
> Seqinfo, but if I use
>    example(SummarizedExperiment)
> to get sset defined, I still get
>    is(sset, "Seqinfo")
> to be FALSE.  I thought this would check for inheritance.

The 'ANY' method on genome is implemented so that if you have a 
seqinfo,SummarizedExperiment-method, you get 'genome' for free. Another 
example is 'rownames' and 'colnames', which are provided for free when a 
dimnames,SummarizedExperiment-method is defined.

 > selectMethod("genome", "SummarizedExperiment")
Method Definition:

function (x)
genome(seqinfo(x))
<environment: namespace:GenomicRanges>

Signatures:
         x
target  "SummarizedExperiment"
defined "ANY"

Martin

>
>> R> head(genome(LAML))
>>    chr1  chr10  chr11  chr12  chr13  chr14
>> "hg19" "hg19" "hg19" "hg19" "hg19" "hg19"
>> R> head(seqlengths(LAML))
>>       chr1     chr10     chr11     chr12     chr13     chr14
>> 249250621 135534747 135006516 133851895 115169878 107349540
>>
>> I wrote a trivial addSeqinfo(x) function that, given a genome, will populate
>> an SE's seqinfo automatically from a BSgenome (if there is one).  The
>> function calls
>> rtracklayer:::SeqinfoForBSGenome(unique(na.omit(genome(x))))[seqlevels(x)]
>> to get the correct information.
>>
>> I hate the fact that there can be NA or differing genomes specified
>> per-chromosome for SummarizedExperiments.  It makes me sad.
>
> I don't like you can mix hg18/hg19 but on the other hand we routinely
> spike in lambda phage and that is not really part of the human genome.
>
>>
>>
>>
>> On Fri, Sep 14, 2012 at 10:54 AM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>>>
>>> Thanks for all the additional methods.  I still miss
>>>    seqlevels, seqlengths, genome
>>>
>>> Below,
>>>
>>> On Wed, Sep 12, 2012 at 3:33 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>>> On 09/12/2012 12:15 PM, Kasper Daniel Hansen wrote:
>>>
>>>>> One thing I have in my package that I find indispensable is combine
>>>>> and (my own) combineList.  The later for combining > 2 objects, which
>>>>> has a lot of possibilities for speed up especially if (very common)
>>>>> all the objects have the same rowData, as opposed to Reduce(combine,
>>>>> LIST)..  Usecase: you need to add additional samples to your
>>>>> SummarizedExperiment.
>>>>
>>>>
>>>> I found it difficult in Biobase to write combine methods for eSet, where
>>>> you're really requiring a lot from the user (about the phenoData /
>>>> featureData structured in the same way) or going through contortions to
>>>> make
>>>> it the same in a reasonable-but-ad-hoc way (e.g., when two columns are
>>>> factors with the same set of levels but encoded differently). Maybe the
>>>> effort required is proportional to the utility of the function
>>>> provided...
>>>> I'll give it some more thought.
>>>
>>> In the abstract case it is hard to imagine combining different
>>> SummarizedExperiments.  My usecase is almost always "additional
>>> samples from the same experiment", and for that situation it is a lot
>>> easier to imagine combining it.  You still need to check that the
>>> granges are similar (and if not, expand some of the assayData with
>>> zeroes or NA's), since the new samples may have coverage in locations
>>> not assayed earlier.  Clearly factors are hard to handle and I assume
>>> there are other hard to handle cases.  Nevertheless, I find such a
>>> function incredibly useful.
>>>
>>> I think it is entirely ok to assume that the user knows what (s)he is
>>> doing.
>>>
>>> Kasper
>>
>>
>>
>>
>> --
>> A model is a lie that helps you see the truth.
>>
>> Howard Skipper
>>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list