[Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments

Martin Morgan mtmorgan at fhcrc.org
Sat Oct 11 19:49:52 CEST 2014


On 10/11/2014 08:41 AM, Vincent Carey wrote:
> Is there anything on the order of as([GeneSet], "GRanges") around?

no, I don't think so; obviously of use and following a common theme. Martin

>
> On Sat, Sep 20, 2014 at 11:34 PM, Gabe Becker <becker.gabe at gene.com> wrote:
>
>> Sean and Vincent,
>>
>> The goal of what we are doing builds off of what Martin has in GSEABase.
>> We were looking to see how much benefit we can get with something
>> lighter-weight that lies between indistinguishable character vectors and
>> the full machinery of GeneSets.
>>
>> Either way, it seems like formalizing the semantic information is a way to
>> do what you want. Furthermore, these classed id objects can be created
>> automatically when there is contextual information e.g. during queries to
>> databases (or db-like objects), and then simply added to metadata
>> DataFrames and re-used.
>>
>> ~G
>>
>>
>>
>>
>> On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>
>>>
>>>
>>> On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com>
>>> wrote:
>>>
>>>> Hey all,
>>>>
>>>> We are in the (very) early stages of experimenting with something that
>>>> seems relevant here: classed identifiers. We are using them for
>>>> database/mart queries, but the same concept could be useful for the cases
>>>> you're describing I think.
>>>>
>>>> E.g.
>>>>
>>>>> mysyms = GeneSymbol(c("BRAF", "BRCA1"))
>>>>> mysyms
>>>> An object of class "GeneSymbol"
>>>> [1] "BRAF"  "BRCA1"
>>>>> yourSE[mysyms, ]
>>>> ...
>>>>
>>>>
>>> This approach has the flavor of some of the functionality that Martin put
>>> together for the GSEABase package (EntrezIdentifier, etc.).
>>>
>>> Sean
>>>
>>>
>>>
>>>>
>>>> This approach has the benefit of being declarative instead of heuristic
>>>> (people won't be able to accidentally invoke it), while still giving most
>>>> of the convenience I believe you are looking for.
>>>>
>>>> The object classes inherit directly from character, so should "just work"
>>>> most of the time, but as I said it's early days; lots more testing for
>>>> functionality and usefulness is needed.
>>>>
>>>> ~G
>>>>
>>>>
>>>> On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey <
>>>> stvjc at channing.harvard.edu>
>>>> wrote:
>>>>
>>>>> OK by me to leave [ alone.  We could start with subsetByEntrez,
>>>>> subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID.
>>>>>
>>>>> Utilities to generate GRanges for queries in each of these vocabularies
>>>>> should, perhaps, be in the OrganismDb space?  Once those are in place
>>>>> no additional infrastructure is necessary?
>>>>>
>>>>> On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <
>>>> tim.triche at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Agreed with Sean, having tried implementing to "magical" alternative
>>>>>>
>>>>>> --t
>>>>>>
>>>>>>> On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov>
>>>> wrote:
>>>>>>>
>>>>>>> Hi, Vince.
>>>>>>>
>>>>>>> I'm coming a little late to the party, but I agree with Kasper's
>>>>>> sentiment
>>>>>>> that the less "magical" approach of using subsetByXXX might be the
>>>>>> cleaner
>>>>>>> way to go for the time being.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
>>>>>> stvjc at channing.harvard.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>> https://github.com/vjcitn/biocMultiAssay/blob/master/vignettes/SEresolver.Rnw
>>>>>>>>
>>>>>>>> shows some modifications to [ that allow subsetting of SE by
>>>>>>>> gene or pathway name
>>>>>>>>
>>>>>>>> it may be premature to work at the [ level.  Kasper suggested
>>>> defining
>>>>>>>> a suite of subsetBy operations that would accomplish this
>>>>>>>>
>>>>>>>> i think we could get something along these lines into the release
>>>>>> without
>>>>>>>> too much more work.  votes?
>>>>>>>>
>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Computational Biologist
>>>> Genentech Research
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>>
>> --
>> Computational Biologist
>> Genentech Research
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list