[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method
Michael Lawrence
lawrence.michael at gene.com
Tue Sep 9 03:42:38 CEST 2014
Instead of printing out multiple lines of a table that is rarely of
interest, could we develop Peter's idea toward something like:
hg19:chr1 hg19:chr2 ...
[lengths ...]
Not sure what condensed notation would be useful for circularity.
On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hickey at wehi.edu.au> wrote:
> Perhaps it might be useful to have some way of highlighting if any of the
> chromosomes are circular or highlighting if there are multiple genomes
> present? Otherwise this information might be hidden in the "…"
>
> Cheers,
> Pete
>
>
> On 09/09/2014, at 9:44 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
>
> > On 09/08/2014 02:28 PM, Peter Hickey wrote:
> >> Just a vote for still allowing for multiple genomes in a Seqinfo object
> (in a GRanges object). My use case is in bisulfite-sequencing experiments
> where there is often a spike-in of a lambda phage genome along with the
> genome of interest (human or mouse). It's often useful to keep all data
> from a single library together in the same objet but process according to
> genome(x) for each seqlevel.
> >
> > Note taken. Thanks Pete! It's always great to know about concrete use
> > cases.
> >
> >>
> >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the
> show method.
> >
> > Or what about displaying the genome next to the seqlevel it's
> > associated with? Like e.g.:
> >
> > > gr
> > GRanges with 3 ranges and 0 metadata columns:
> > seqnames ranges strand
> > <Rle> <IRanges> <Rle>
> > [1] chr14 [19069583, 19069654] +
> > [2] chr14 [19363738, 19363809] +
> > [3] chr14 [19363755, 19363826] -
> > [4] chr14 [19369799, 19369870] +
> > ---
> > seqinfo:
> > seqlevels seqlengths isCircular genome
> > chr1 249250621 <NA> hg19
> > chr10 135534747 <NA> hg19
> > chr11 135006516 <NA> hg19
> > ... ... ... ...
> > chrUn_gl000249 38502 <NA> hg19
> > chrX 155270560 <NA> hg19
> > chrY 59373566 <NA> hg19
> >
> > That way, we also raise awareness about the isCircular field.
> > The current choice to only display the seqlengths pre-dates the
> > existence of the seqinfo slot but might be a little bit misleading
> > those days since it only exposes some arbitrary seqinfo fields.
> >
> > H.
> >
> >>
> >> Cheers,
> >> Pete
> >>
> >>
> >>> I might have requested the genome annotation, but I'm pretty sure it
> wasn't
> >>> me who decide on tracking it on a per-sequence basis. I could imagine
> use
> >>> cases for that though, e.g., when diagnosing sequencing contamination
> (like
> >>> human vs. mouse). But most other tools and file formats expect a single
> >>> genome per "track", so, for example, rtracklayer has an internal
> function
> >>> singleGenome() to take care of this.
> >>>
> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <hpages at fhcrc.org> wrote:
> >>>
> >>>> Hi Vince,
> >>>>
> >>>> Yes it would make sense to have the "show" method report the genome
> >>>> when genome(x) contains a unique non-NA value. I think the main
> >>>> use case for having the genome defined at the sequence level instead
> >>>> of the whole object level is metagenomics. Maybe Michael has some
> other
> >>>> good use cases to share since IIRC he requested the addition of the
> >>>> genome field a couple of years ago and made the case for having it
> >>>> defined at the sequence level.
> >>>>
> >>>> Cheers,
> >>>> H.
> >>>>
> >>>>
> >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
> >>>>
> >>>>> For GRanges x, my naive expectation is that genome(x) returns a
> length-
> >>>>>
> >>>>> one tag identifying the genome to which chromosomal coordinates
> >>>>>
> >>>>> correspond. The genome() method seems to have sequence-specific
> >>>>>
> >>>>> semantics, which makes sense, but when we identify sequence
> >>>>>
> >>>>> with chromosome, it seems too complicated. Is there a use case for
> >>>>>
> >>>>> a GRanges with sequences from several different genomes?
> >>>>>
> >>>>>
> >>>>> One reason I am inquiring is that I feel it would be nice to have the
> >>>>> GRanges show() method report, prominently, the genome in use (or NA
> >>>>>
> >>>>> if unspecified). This could be accomplished by reporting
> >>>>> unique(genome(x)), and perhaps that would be satisfactory.
> >>>>>
> >>>>> after example(genome) :
> >>>>>
> >>>>> seqinfo(txdb)
> >>>>>>
> >>>>>
> >>>>> Seqinfo of length 15
> >>>>>
> >>>>> seqnames seqlengths isCircular genome
> >>>>>
> >>>>> CH2L 23011544 FALSE dm3
> >>>>>
> >>>>> CH2R 21146708 FALSE dm3
> >>>>>
> >>>>> CH3L 24543557 FALSE dm3
> >>>>>
> >>>>> CH3R 27905053 FALSE dm3
> >>>>>
> >>>>> CH4 1351857 FALSE dm3
> >>>>>
> >>>>> ... ... ... ...
> >>>>>
> >>>>> CH3LHet 2555491 FALSE dm3
> >>>>>
> >>>>> CH3RHet 2517507 FALSE dm3
> >>>>>
> >>>>> CHXHet 204112 FALSE dm3
> >>>>>
> >>>>> CHYHet 347038 FALSE dm3
> >>>>>
> >>>>> CHUextra 29004656 FALSE dm3
> >>>>>
> >>>>> genome(seqinfo(txdb))
> >>>>>>
> >>>>>
> >>>>> CH2L CH2R CH3L CH3R CH4 CHX CHU
> M
> >>>>>
> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
> "dm3"
> >>>>>
> >>>>> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet CHUextra
> >>>>>
> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
> >>>>>
> >>>>> [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>
> >>>>>
> >>>> --
> >>>> Herv? Pag?s
> >>>>
> >>>> Program in Computational Biology
> >>>> Division of Public Health Sciences
> >>>> Fred Hutchinson Cancer Research Center
> >>>> 1100 Fairview Ave. N, M1-B514
> >>>> P.O. Box 19024
> >>>> Seattle, WA 98109-1024
> >>>>
> >>>> E-mail: hpages at fhcrc.org
> >>>> Phone: (206) 667-5791
> >>>> Fax: (206) 667-1319
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>
> >> --------------------------------
> >> Peter Hickey,
> >> PhD Student/Research Assistant,
> >> Bioinformatics Division,
> >> Walter and Eliza Hall Institute of Medical Research,
> >> 1G Royal Parade, Parkville, Vic 3052, Australia.
> >> Ph: +613 9345 2324
> >>
> >> hickey at wehi.edu.au
> >> http://www.wehi.edu.au
> >>
> >> ______________________________________________________________________
> >> The information in this email is confidential and intend...{{dropped:6}}
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fhcrc.org
> > Phone: (206) 667-5791
> > Fax: (206) 667-1319
>
> --------------------------------
> Peter Hickey,
> PhD Student/Research Assistant,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Ph: +613 9345 2324
>
> hickey at wehi.edu.au
> http://www.wehi.edu.au
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:12}}
More information about the Bioc-devel
mailing list