[Bioc-devel] SummarizedExperiments

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Oct 4 20:13:12 CEST 2012


For symmetry, could we get
  granges<-
added?  It is confusing that granges() work, but not the replacement function.

Thanks,
Kasper

On Wed, Sep 19, 2012 at 3:17 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
> For extending SummarizedExperiments it would be convenient to have
> something like Biobase::assayDataValidMembers
>
> We might also consider putting Biobase::validMsg into BiocGenerics.
>
> Kasper
>
> On Fri, Sep 14, 2012 at 5:19 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 09/14/2012 11:46 AM, Kasper Daniel Hansen wrote:
>>>
>>> On Fri, Sep 14, 2012 at 2:25 PM, Tim Triche, Jr. <tim.triche at gmail.com>
>>> wrote:
>>>>
>>>> For what it's worth I already wrote a CombineSEwithNAs() function to do
>>>> this
>>>> on the disjoint ranges for RRBS.  It assumes that there isn't any
>>>> additional
>>>> colData or elementMetadata of interest (for reasons that will become
>>>> clear)
>>>> and further assumes that the user will want to smooth.
>>>
>>>
>>> I have one in bsseq as well (as I said earlier), but this would still
>>> be nice to think about in the most general case possible.
>>>
>>>> seqlevels, seqlengths, genome are implemented via seqinfo as of the
>>>> latest
>>>> GenomicRanges package (went in some time ago, thanks to MM):
>>>
>>>
>>> There is something I am missing here.  It clearly works.  But
>>> showMethods("genome") tells me that methods are defined for Any,
>>> Seqinfo, but if I use
>>>    example(SummarizedExperiment)
>>> to get sset defined, I still get
>>>    is(sset, "Seqinfo")
>>> to be FALSE.  I thought this would check for inheritance.
>>
>>
>> The 'ANY' method on genome is implemented so that if you have a
>> seqinfo,SummarizedExperiment-method, you get 'genome' for free. Another
>> example is 'rownames' and 'colnames', which are provided for free when a
>> dimnames,SummarizedExperiment-method is defined.
>>
>>> selectMethod("genome", "SummarizedExperiment")
>> Method Definition:
>>
>> function (x)
>> genome(seqinfo(x))
>> <environment: namespace:GenomicRanges>
>>
>> Signatures:
>>         x
>> target  "SummarizedExperiment"
>> defined "ANY"
>>
>> Martin
>>
>>
>>>
>>>> R> head(genome(LAML))
>>>>    chr1  chr10  chr11  chr12  chr13  chr14
>>>> "hg19" "hg19" "hg19" "hg19" "hg19" "hg19"
>>>> R> head(seqlengths(LAML))
>>>>       chr1     chr10     chr11     chr12     chr13     chr14
>>>> 249250621 135534747 135006516 133851895 115169878 107349540
>>>>
>>>> I wrote a trivial addSeqinfo(x) function that, given a genome, will
>>>> populate
>>>> an SE's seqinfo automatically from a BSgenome (if there is one).  The
>>>> function calls
>>>>
>>>> rtracklayer:::SeqinfoForBSGenome(unique(na.omit(genome(x))))[seqlevels(x)]
>>>> to get the correct information.
>>>>
>>>> I hate the fact that there can be NA or differing genomes specified
>>>> per-chromosome for SummarizedExperiments.  It makes me sad.
>>>
>>>
>>> I don't like you can mix hg18/hg19 but on the other hand we routinely
>>> spike in lambda phage and that is not really part of the human genome.
>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 14, 2012 at 10:54 AM, Kasper Daniel Hansen
>>>> <kasperdanielhansen at gmail.com> wrote:
>>>>>
>>>>>
>>>>> Thanks for all the additional methods.  I still miss
>>>>>    seqlevels, seqlengths, genome
>>>>>
>>>>> Below,
>>>>>
>>>>> On Wed, Sep 12, 2012 at 3:33 PM, Martin Morgan <mtmorgan at fhcrc.org>
>>>>> wrote:
>>>>>>
>>>>>> On 09/12/2012 12:15 PM, Kasper Daniel Hansen wrote:
>>>>>
>>>>>
>>>>>>> One thing I have in my package that I find indispensable is combine
>>>>>>> and (my own) combineList.  The later for combining > 2 objects, which
>>>>>>> has a lot of possibilities for speed up especially if (very common)
>>>>>>> all the objects have the same rowData, as opposed to Reduce(combine,
>>>>>>> LIST)..  Usecase: you need to add additional samples to your
>>>>>>> SummarizedExperiment.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I found it difficult in Biobase to write combine methods for eSet,
>>>>>> where
>>>>>> you're really requiring a lot from the user (about the phenoData /
>>>>>> featureData structured in the same way) or going through contortions to
>>>>>> make
>>>>>> it the same in a reasonable-but-ad-hoc way (e.g., when two columns are
>>>>>> factors with the same set of levels but encoded differently). Maybe the
>>>>>> effort required is proportional to the utility of the function
>>>>>> provided...
>>>>>> I'll give it some more thought.
>>>>>
>>>>>
>>>>> In the abstract case it is hard to imagine combining different
>>>>> SummarizedExperiments.  My usecase is almost always "additional
>>>>> samples from the same experiment", and for that situation it is a lot
>>>>> easier to imagine combining it.  You still need to check that the
>>>>> granges are similar (and if not, expand some of the assayData with
>>>>> zeroes or NA's), since the new samples may have coverage in locations
>>>>> not assayed earlier.  Clearly factors are hard to handle and I assume
>>>>> there are other hard to handle cases.  Nevertheless, I find such a
>>>>> function incredibly useful.
>>>>>
>>>>> I think it is entirely ok to assume that the user knows what (s)he is
>>>>> doing.
>>>>>
>>>>> Kasper
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> A model is a lie that helps you see the truth.
>>>>
>>>> Howard Skipper
>>>>
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793



More information about the Bioc-devel mailing list