It's the implementation: GRangesList is compressed; GenomicRangesList is
not.

Efficient: GRangesList compresses the data, i.e., it concatenates all the
objects. Your code proceeds to loop over the elements, which needs to
extract them. GenomicRangesList would keep everything as an ordinary list
internally.

General: If you're grabbing arbitrary tracks from AnnotationHub or anywhere
else, they might not all have the same mcols; GRangesList requires that the
set of mcols is the same for all objects (because they all end up in the
same GRanges).




On Thu, Oct 24, 2013 at 9:30 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote:

> > GenomicRangesList() should be used here instead of GRangesList, for
> efficiency, generality and perhaps semantics.
>
> What makes GenomicRangesList() more general or efficient?  I did not
> realize that I should be doing this.
>
> Thanks,
>
> --t
>
>
> *He that would live in peace and at ease, *
> *Must not speak all he knows, nor judge all he sees.*
> *
> *
> Benjamin Franklin, Poor Richard's Almanack<http://archive.org/details/poorrichardsalma00franrich>
>
>
> On Thu, Oct 24, 2013 at 3:36 PM, Michael Lawrence <
> lawrence.michael@gene.com> wrote:
>
>>
>>
>>
>> On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote:
>>
>>> ps.  Why +/- 100kb?  That's an awful lot of padding given that tons of
>>> the genome falls into h3k4me1 peaks
>>>
>>>
>>>
>>> *He that would live in peace and at ease, *
>>> *Must not speak all he knows, nor judge all he sees.*
>>> *
>>> *
>>> Benjamin Franklin, Poor Richard's Almanack<http://archive.org/details/poorrichardsalma00franrich>
>>>
>>>
>>> On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote:
>>>
>>>> If I'm guessing right, something like this... ?
>>>>
>>>> grset <- readRDS("grset.rds")
>>>> show(grset)
>>>> ##
>>>> ## class: GenomicRatioSet
>>>> ## dim: 468211 32
>>>> ## exptData(0):
>>>> ## assays(2): M CN
>>>> ## ...
>>>> ##
>>>> highVar <- names(which(rowData(grset)$varByGroupQval < 0.05))
>>>> ##
>>>> ## about 50 probes, here
>>>> ##
>>>> ## could also use FDb.InfiniumMethylation.hg19 if not already mapped
>>>>
>>>> grow <- function(x, y) resize(x, width(x) + (2*y))
>>>> probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb
>>>>
>>>>
>> This grow function is currently implemented as:
>> granges(grset)[highVar] + 1e5
>>
>> If people like an alias like "grow" or "widen", we should consider adding
>> it.
>>
>> require(AnnotationHub)
>>>> hub = AnnotationHub()
>>>> m = metadata(hub)
>>>> ##
>>>> ## ...time passes...
>>>> ##
>>>>
>>>> histoneMarks <- c('k27ac','k4me1','k4me3')
>>>> names(histoneMarks) <- histoneMarks
>>>>
>>>> pre <-
>>>> 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone'
>>>> post <- 'StdPk.broadPeak_0.0.1.RData'
>>>> gm12878 <- GRangesList(lapply(histoneMarks,
>>>>                               function(x)
>>>>                                 hub[[paste0(pre, 'Gm12878H3', x,
>>>> post)]]))
>>>>
>>>>
>> I kind of think that GenomicRangesList() should be used here instead of
>> GRangesList, for efficiency, generality and perhaps semantics.
>>
>>
>>>  lapply(gm12878, function(x) names(subsetByOverlaps(probes, x)))
>>>> ## $k27ac
>>>> ## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020"
>>>> ## ...
>>>> ##
>>>> ## $k4me1
>>>> ## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956"
>>>> ## ...
>>>> ##
>>>> ## $k4me3
>>>> ## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649"
>>>> ## ...
>>>>
>>>> Is that pretty similar to what you were thinking?  The rest will be an
>>>> issue of hunt-and-peck; you could also use countOverlaps, though it won't
>>>> make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active
>>>> enhancers.
>>>>
>>>> hope this helps,
>>>>
>>>> --t
>>>>
>>>>
>>>>
>>>> *He that would live in peace and at ease, *
>>>> *Must not speak all he knows, nor judge all he sees.*
>>>> *
>>>> *
>>>> Benjamin Franklin, Poor Richard's Almanack<http://archive.org/details/poorrichardsalma00franrich>
>>>>
>>>>
>>>> On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com>wrote:
>>>>
>>>>> Thanks much for the help. Will have a go and let you know.
>>>>> I have about 80 probes, from many different genes. I'm not sure if
>>>>> they can be summarized, but sure it's worth having a look.
>>>>>
>>>>> BR,
>>>>> Khadeeja
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thursday, October 24, 2013 8:53 PM, Martin Morgan <
>>>>> mtmorgan@fhcrc.org> wrote:
>>>>>
>>>>> On 10/24/2013 09:37 AM, khadeeja ismail wrote:
>>>>> >
>>>>> >
>>>>> > Hi,
>>>>> > I am working  with some 450k array probes which I need to look up in
>>>>> Geneome browser to see in which type of areas these probes are located in.
>>>>> For example, if the CpG site (+/- 100kb) overlaps with any of the following
>>>>> in the GM12878 track.
>>>>> >
>>>>> >
>>>>> > Layered H3K27Ac
>>>>> > Layered H3K4Me1
>>>>> > Layered H3K4Me3
>>>>> > Transcription
>>>>> > DNase Clusters
>>>>> > DNase Clusters V1
>>>>> > Txn Fac ChIP V3
>>>>> > Txn Factor ChIP
>>>>>
>>>>> These tracks are available in AnnotationHub
>>>>>
>>>>>    library(AnnotationHub)
>>>>>    hub = AnnotationHub()
>>>>>    m = metadata(hub)
>>>>>
>>>>> and then
>>>>>
>>>>> > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)])
>>>>> [1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk"
>>>>> [2] "wgEncodeBroadHistoneNhaH3k27acStdPk"
>>>>> [3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk"
>>>>> [4] "wgEncodeBroadHistoneK562H3k27acStdPk"
>>>>> [5] "wgEncodeBroadHistoneGm12878H3k27acStdPk"
>>>>> [6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk"
>>>>>
>>>>> > xx =
>>>>>
>>>>> hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData
>>>>> Retrieving
>>>>>
>>>>> 'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData'
>>>>>
>>>>> > head(xx)
>>>>> GRanges with 6 ranges and 5 metadata columns:
>>>>>        seqnames               ranges strand |        name     score
>>>>> signalValue
>>>>>           <Rle>            <IRanges>  <Rle> | <character> <integer>
>>>>>  <numeric>
>>>>>    [1]    chr22 [17091048, 17091199]      * |           .       579
>>>>>  11.651761
>>>>>    [2]    chr22 [17305774, 17306441]      * |           .       531
>>>>>  10.111585
>>>>>    [3]    chr22 [17517314, 17517945]      * |           .       527
>>>>> 9.991400
>>>>>    [4]    chr22 [17518132, 17518819]      * |           .       837
>>>>>  19.847850
>>>>>           pValue    qValue
>>>>>        <numeric> <numeric>
>>>>>    [1]       2.4        -1
>>>>>    [2]      15.4        -1
>>>>>    [3]     100.0        -1
>>>>>    [4]      15.3        -1
>>>>>   [ reached getOption("max.print") -- omitted 2 rows ]
>>>>>
>>>>> and then ready for findOverlaps or other GRanges operations. There's a
>>>>> vignette
>>>>> in AnnotationHub
>>>>>
>>>>>
>>>>> http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html
>>>>>
>>>>> and it is mentioned in the work flow on annotation and
>>>>> AnnotatingRanges work
>>>>> flows are relevant
>>>>>
>>>>>   http://bioconductor.org/help/workflows/annotation/annotation/
>>>>>   http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/
>>>>>
>>>>> It would be interesting and useful to have this as a stand-alone work
>>>>> flow, so
>>>>> if you do pursue this root and are interested in writing up a workflow
>>>>> then let
>>>>> me know...
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> >
>>>>> >
>>>>> > I would like to do it as batch and not one by one since the list of
>>>>> probes is long. I have tried querying the GenomeBrowser database and also
>>>>> the rtracklayer package in R but have not been successful. Would be great
>>>>> if anyone can give me any ideas on how it can be done.
>>>>> >
>>>>> > Thanking you,
>>>>> > Khadeeja
>>>>> >     [[alternative HTML version deleted]]
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Bioconductor mailing list
>>>>> > Bioconductor@r-project.org
>>>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> > Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N.
>>>>> PO Box 19024 Seattle, WA 98109
>>>>>
>>>>> Location: Arnold Building M1 B861
>>>>> Phone: (206) 667-2793
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor@r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>
>>>
>>
>

	[[alternative HTML version deleted]]

