[Bioc-devel] annoSet?
Tim Triche, Jr.
tim.triche at gmail.com
Wed Aug 12 23:54:17 CEST 2015
two thoughts
1) the ChromImpute calls come from individually ensemble-imputed tracks
which together suggest a state N with probability p, such that p(N) =
1-p(!N) and in some cases p may be rather less than 1 for a given span of
~200bp. The uncertainty in state assignments is actually of interest just
as it was with chromHMM, but storing it is also a bit messy because it's a
much larger data structure than just seqname-start-end for the segment
calls. It is however informative in terms of differences between (or
within, cf. scATAC) cell types. This is something I probably should have
developed further in chromophobe
2) at some point a lot of this question devolves into peak calling, i.e.
what is the exemplar distribution for state N as a multivariate Bernoulli
(say perhaps H3K27ac:1, H3K4me1:1, H3K4me3:0, H3K27me3:0, DNAm:0, DHS:1 for
an active enhancer). The original and still reasonable motivation for
using an HMM or factorial HMM to "discover" underlying states seems to have
fallen by the wayside, for better or worse, such that storing the marginal
probability that a given span is called "present" or "absent" for a mark
might work fine
The proposed use case is why I started working on chromophobe (
https://github.com/ttriche/chromophobe) but as time went by it seemed like
I was the only one using it, and (worse) at that point I hadn't begun to
automate documentation and test cases. The idea was to store a joint
segmentation model along with its segment-wise uncertainties, something
that probably benefits from a bigMatrix or other out-of-core backing store
for the uncertainties (perhaps a big sparse Matrix would suffice). A use
case that might revive the exercise would be importing all of the
ChromImpute tracks and the associated transition/emission matrices, perhaps
with the call uncertainties as a second milestone. These sorts of issues
show up on a not-irregular basis disguised as other problems, so it may be
worth doing, similar to the multi-assay approach for trying to impute
missing assays.
--t
On Wed, Aug 12, 2015 at 2:01 PM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:
> It seems to me we may need a class to manage related annotation
> structures. For example, the chromImpute segmentations of the genome
> defined for various cell types. I would like to be able to take a region
> of the genome (say a SNP) and ask how the state varies across cell types.
>
> AnnotationHub will provide access to cell-type specific GRanges but there
> is no container that I can think of that would coordinate these as
> analogous
> to different "samples".
>
> Am I missing something?
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list