[Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments

Vincent Carey stvjc at channing.harvard.edu
Sat Oct 11 23:25:33 CEST 2014


On Sat, Oct 11, 2014 at 5:17 PM, Michael Lawrence <lawrence.michael at gene.com
> wrote:

> But what it would do exactly?
>
> Probably would want to be able to extract a gene list from a TxDb, then
> extract the desired type of structure from the TxDb.
>
> Not too bad right now, but it would be nice to leverage the identifier
> type information on the gene list object.
>
> Currently:
> tx <- transcripts(txdb, vals=list(gene_id=genes))
>
> Proposed:
> tx <- transcripts(txdb[GeneList])
>

yes, that makes sense.  i don't go to txdb's as naturally as i should.


>
>
>
> On Sat, Oct 11, 2014 at 10:49 AM, Martin Morgan <mtmorgan at fhcrc.org>
> wrote:
>
>> On 10/11/2014 08:41 AM, Vincent Carey wrote:
>>
>>> Is there anything on the order of as([GeneSet], "GRanges") around?
>>>
>>
>> no, I don't think so; obviously of use and following a common theme.
>> Martin
>>
>>
>>
>>> On Sat, Sep 20, 2014 at 11:34 PM, Gabe Becker <becker.gabe at gene.com>
>>> wrote:
>>>
>>>  Sean and Vincent,
>>>>
>>>> The goal of what we are doing builds off of what Martin has in GSEABase.
>>>> We were looking to see how much benefit we can get with something
>>>> lighter-weight that lies between indistinguishable character vectors and
>>>> the full machinery of GeneSets.
>>>>
>>>> Either way, it seems like formalizing the semantic information is a way
>>>> to
>>>> do what you want. Furthermore, these classed id objects can be created
>>>> automatically when there is contextual information e.g. during queries
>>>> to
>>>> databases (or db-like objects), and then simply added to metadata
>>>> DataFrames and re-used.
>>>>
>>>> ~G
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdavis2 at mail.nih.gov>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com>
>>>>> wrote:
>>>>>
>>>>>  Hey all,
>>>>>>
>>>>>> We are in the (very) early stages of experimenting with something that
>>>>>> seems relevant here: classed identifiers. We are using them for
>>>>>> database/mart queries, but the same concept could be useful for the
>>>>>> cases
>>>>>> you're describing I think.
>>>>>>
>>>>>> E.g.
>>>>>>
>>>>>>  mysyms = GeneSymbol(c("BRAF", "BRCA1"))
>>>>>>> mysyms
>>>>>>>
>>>>>> An object of class "GeneSymbol"
>>>>>> [1] "BRAF"  "BRCA1"
>>>>>>
>>>>>>> yourSE[mysyms, ]
>>>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>  This approach has the flavor of some of the functionality that
>>>>> Martin put
>>>>> together for the GSEABase package (EntrezIdentifier, etc.).
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> This approach has the benefit of being declarative instead of
>>>>>> heuristic
>>>>>> (people won't be able to accidentally invoke it), while still giving
>>>>>> most
>>>>>> of the convenience I believe you are looking for.
>>>>>>
>>>>>> The object classes inherit directly from character, so should "just
>>>>>> work"
>>>>>> most of the time, but as I said it's early days; lots more testing for
>>>>>> functionality and usefulness is needed.
>>>>>>
>>>>>> ~G
>>>>>>
>>>>>>
>>>>>> On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey <
>>>>>> stvjc at channing.harvard.edu>
>>>>>> wrote:
>>>>>>
>>>>>>  OK by me to leave [ alone.  We could start with subsetByEntrez,
>>>>>>> subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID.
>>>>>>>
>>>>>>> Utilities to generate GRanges for queries in each of these
>>>>>>> vocabularies
>>>>>>> should, perhaps, be in the OrganismDb space?  Once those are in place
>>>>>>> no additional infrastructure is necessary?
>>>>>>>
>>>>>>> On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <
>>>>>>>
>>>>>> tim.triche at gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Agreed with Sean, having tried implementing to "magical" alternative
>>>>>>>>
>>>>>>>> --t
>>>>>>>>
>>>>>>>>  On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>>> Hi, Vince.
>>>>>>>>>
>>>>>>>>> I'm coming a little late to the party, but I agree with Kasper's
>>>>>>>>>
>>>>>>>> sentiment
>>>>>>>>
>>>>>>>>> that the less "magical" approach of using subsetByXXX might be the
>>>>>>>>>
>>>>>>>> cleaner
>>>>>>>>
>>>>>>>>> way to go for the time being.
>>>>>>>>>
>>>>>>>>> Sean
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
>>>>>>>>>
>>>>>>>> stvjc at channing.harvard.edu>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>  https://github.com/vjcitn/biocMultiAssay/blob/master/
>>>>>> vignettes/SEresolver.Rnw
>>>>>>
>>>>>>>
>>>>>>>>>> shows some modifications to [ that allow subsetting of SE by
>>>>>>>>>> gene or pathway name
>>>>>>>>>>
>>>>>>>>>> it may be premature to work at the [ level.  Kasper suggested
>>>>>>>>>>
>>>>>>>>> defining
>>>>>>
>>>>>>> a suite of subsetBy operations that would accomplish this
>>>>>>>>>>
>>>>>>>>>> i think we could get something along these lines into the release
>>>>>>>>>>
>>>>>>>>> without
>>>>>>>>
>>>>>>>>> too much more work.  votes?
>>>>>>>>>>
>>>>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>          [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Computational Biologist
>>>>>> Genentech Research
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Computational Biologist
>>>> Genentech Research
>>>>
>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list