Hi,

Thanks for a lot of help and an interesting discussion. IT has been very useful.
I ran the code and was able to get the overlaps for the histone markers, but still need to try the same for the tnx, and dnase etc. Hope no problems pop up. Btw, I meant 100bp, and not 100kb.
Hector, I had some problem with the browser, but will try again later and let you know. Need to spend some more time on this.

Thanks again, and BR,
Khadeeja


On Friday, October 25, 2013 5:54 PM, "Tim Triche, Jr." <tim.triche@gmail.com> wrote:
 
Got it. Thanks for the implementation details -- I think this can speed up a number of other functions of mine too. Much obliged!

--t

On Oct 25, 2013, at 4:49 AM, Michael Lawrence <lawrence.michael@gene.com> wrote:


It's the implementation: GRangesList is compressed; GenomicRangesList is not.
>
>
>Efficient: GRangesList compresses the data, i.e., it concatenates all the objects. Your code proceeds to loop over the elements, which needs to extract them. GenomicRangesList would keep everything as an ordinary list internally.
>
>General: If you're grabbing arbitrary tracks from AnnotationHub or anywhere else, they might not all have the same mcols; GRangesList requires that the set of mcols is the same for all objects (because they all end up in the same GRanges).
>
>
>
>
>
>
>On Thu, Oct 24, 2013 at 9:30 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote:
>
>> GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics.
>>
>>
>>What makes GenomicRangesList() more general or efficient?  I did not realize that I should be doing this.
>>
>>
>>Thanks,
>>
>>
>>--t
>>
>>
>>
>>
>>He that would live in peace and at ease, 
>>Must not speak all he knows, nor judge all he sees.
>>
>>
>>
>>Benjamin Franklin, Poor Richard's Almanack
>>
>>
>>On Thu, Oct 24, 2013 at 3:36 PM, Michael Lawrence <lawrence.michael@gene.com> wrote:
>>
>>
>>>
>>>
>>>
>>>
>>>On Thu, Oct 24, 2013 at 2:54 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote:
>>>
>>>ps.  Why +/- 100kb?  That's an awful lot of padding given that tons of the genome falls into h3k4me1 peaks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>He that would live in peace and at ease, 
>>>>Must not speak all he knows, nor judge all he sees.
>>>>
>>>>
>>>>
>>>>Benjamin Franklin, Poor Richard's Almanack
>>>>
>>>>
>>>>On Thu, Oct 24, 2013 at 2:52 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote:
>>>>
>>>>If I'm guessing right, something like this... ?
>>>>>
>>>>>
>>>>>grset <- readRDS("grset.rds")
>>>>>show(grset)
>>>>>##
>>>>>## class: GenomicRatioSet 
>>>>>## dim: 468211 32 
>>>>>## exptData(0):
>>>>>## assays(2): M CN
>>>>>## ...
>>>>>##
>>>>>highVar <- names(which(rowData(grset)$varByGroupQval < 0.05))
>>>>>##
>>>>>## about 50 probes, here
>>>>>##
>>>>>## could also use FDb.InfiniumMethylation.hg19 if not already mapped
>>>>>
>>>>>
>>>>>grow <- function(x, y) resize(x, width(x) + (2*y))
>>>>>
>>>>>probes <- grow(granges(grset)[highVar], 10e5) ## +/- 100kb
>>>>>
>>>>>
>>>
>>>
>>>This grow function is currently implemented as:
>>>
>>>granges(grset)[highVar] + 1e5
>>> 
>>>
>>>If people like an alias like "grow" or "widen", we should consider adding it.
>>>
>>>
>>>require(AnnotationHub)
>>>>>hub = AnnotationHub()
>>>>>m = metadata(hub)
>>>>>## 
>>>>>## ...time passes...
>>>>>## 
>>>>>
>>>>>
>>>>>histoneMarks <- c('k27ac','k4me1','k4me3')
>>>>>names(histoneMarks) <- histoneMarks
>>>>>
>>>>>
>>>>>pre <- 'goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistone'
>>>>>post <- 'StdPk.broadPeak_0.0.1.RData'
>>>>>gm12878 <- GRangesList(lapply(histoneMarks,
>>>>>                              function(x)
>>>>>                                hub[[paste0(pre, 'Gm12878H3', x, post)]]))
>>>>>
>>>>>
>>>
>>>
>>>I kind of think that GenomicRangesList() should be used here instead of GRangesList, for efficiency, generality and perhaps semantics.
>>> 
>>>
>>>lapply(gm12878, function(x) names(subsetByOverlaps(probes, x)))
>>>>>## $k27ac
>>>>>## [1] "cg07238657" "cg06431905" "cg14555649" "cg00031967" "cg10311020"
>>>>>## ...
>>>>>##
>>>>>## $k4me1
>>>>>## [1] "cg25243082" "cg06431905" "cg00031967" "cg10311020" "cg05482956"
>>>>>## ...
>>>>>##
>>>>>## $k4me3
>>>>>## [1] "cg16220844" "cg24991732" "cg07238657" "cg06431905" "cg14555649"
>>>>>## ...
>>>>>
>>>>>
>>>>>Is that pretty similar to what you were thinking?  The rest will be an issue of hunt-and-peck; you could also use countOverlaps, though it won't make it as easy to e.g. intersect h3k27ac and h3k4me1 to find active enhancers.
>>>>>
>>>>>
>>>>>hope this helps,
>>>>>
>>>>>
>>>>>--t
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>He that would live in peace and at ease, 
>>>>>Must not speak all he knows, nor judge all he sees.
>>>>>
>>>>>
>>>>>
>>>>>Benjamin Franklin, Poor Richard's Almanack
>>>>>
>>>>>
>>>>>
>>>>>On Thu, Oct 24, 2013 at 11:21 AM, khadeeja ismail <hajjja@yahoo.com> wrote:
>>>>>
>>>>>Thanks much for the help. Will have a go and let you know.
>>>>>>I have about 80 probes, from many different genes. I'm not sure if they can be summarized, but sure it's worth having a look.
>>>>>>
>>>>>>BR,
>>>>>>Khadeeja
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Thursday, October 24, 2013 8:53 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote:
>>>>>>
>>>>>>On 10/24/2013 09:37 AM, khadeeja ismail wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>> I am working  with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in.  For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track.
>>>>>>>
>>>>>>>
>>>>>>> Layered H3K27Ac
>>>>>>> Layered H3K4Me1
>>>>>>> Layered H3K4Me3
>>>>>>> Transcription
>>>>>>> DNase Clusters
>>>>>>> DNase Clusters V1
>>>>>>> Txn Fac ChIP V3
>>>>>>> Txn Factor ChIP
>>>>>>
>>>>>>These tracks are available in AnnotationHub
>>>>>>
>>>>>>   library(AnnotationHub)
>>>>>>   hub = AnnotationHub()
>>>>>>   m = metadata(hub)
>>>>>>
>>>>>>and then
>>>>>>
>>>>>>> head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)])
>>>>>>[1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk"
>>>>>>[2] "wgEncodeBroadHistoneNhaH3k27acStdPk"
>>>>>>[3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk"
>>>>>>[4] "wgEncodeBroadHistoneK562H3k27acStdPk"
>>>>>>[5] "wgEncodeBroadHistoneGm12878H3k27acStdPk"
>>>>>>[6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk"
>>>>>>
>>>>>>> xx =
>>>>>>hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData
>>>>>>Retrieving
>>>>>>'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData'
>>>>>>
>>>>>>> head(xx)
>>>>>>GRanges with 6 ranges and 5 metadata columns:
>>>>>>       seqnames               ranges strand |        name     score signalValue
>>>>>>          <Rle>            <IRanges>  <Rle> | <character> <integer>   <numeric>
>>>>>>   [1]    chr22 [17091048, 17091199]      * |           .       579   11.651761
>>>>>>   [2]    chr22 [17305774, 17306441]      * |           .       531   10.111585
>>>>>>   [3]    chr22 [17517314, 17517945]      * |           .       527    9.991400
>>>>>>   [4]    chr22 [17518132, 17518819]      * |           .       837   19.847850
>>>>>>          pValue    qValue
>>>>>>       <numeric> <numeric>
>>>>>>   [1]       2.4        -1
>>>>>>   [2]      15.4        -1
>>>>>>   [3]     100.0        -1
>>>>>>   [4]      15.3        -1
>>>>>>  [ reached getOption("max.print") -- omitted 2 rows ]
>>>>>>
>>>>>>and then ready for findOverlaps or other GRanges operations. There's a vignette
>>>>>>in AnnotationHub
>>>>>>
>>>>>>  http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html
>>>>>>
>>>>>>and it is mentioned in the work flow on annotation and AnnotatingRanges work
>>>>>>flows are relevant
>>>>>>
>>>>>>  http://bioconductor.org/help/workflows/annotation/annotation/
>>>>>>  http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/
>>>>>>
>>>>>>It would be interesting and useful to have this as a stand-alone work flow, so
>>>>>>if you do pursue this root and are interested in writing up a workflow then let
>>>>>>me know...
>>>>>>
>>>>>>Martin
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done.
>>>>>>>
>>>>>>> Thanking you,
>>>>>>> Khadeeja
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor@r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>
>>>>>>
>>>>>>--
>>>>>>Computational Biology / Fred Hutchinson Cancer Research Center
>>>>>>1100 Fairview Ave. N.
>>>>>>PO Box 19024 Seattle, WA 98109
>>>>>>
>>>>>>Location: Arnold Building M1 B861
>>>>>>Phone: (206) 667-2793
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioconductor mailing list
>>>>>>Bioconductor@r-project.org
>>>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
	[[alternative HTML version deleted]]