[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Michael Lawrence lawrence.michael at gene.com
Tue Sep 9 03:42:38 CEST 2014


Instead of printing out multiple lines of a table that is rarely of
interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.


On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hickey at wehi.edu.au> wrote:

> Perhaps it might be useful to have some way of highlighting if any of the
> chromosomes are circular or highlighting if there are multiple genomes
> present? Otherwise this information might be hidden in the "…"
>
> Cheers,
> Pete
>
>
> On 09/09/2014, at 9:44 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
>
> > On 09/08/2014 02:28 PM, Peter Hickey wrote:
> >> Just a vote for still allowing for multiple genomes in a Seqinfo object
> (in a GRanges object). My use case is in bisulfite-sequencing experiments
> where there is often a spike-in of a lambda phage genome along with the
> genome of interest (human or mouse). It's often useful to keep all data
> from a single library together in the same objet but process according to
> genome(x) for each seqlevel.
> >
> > Note taken. Thanks Pete! It's always great to know about concrete use
> > cases.
> >
> >>
> >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the
> show method.
> >
> > Or what about displaying the genome next to the seqlevel it's
> > associated with? Like e.g.:
> >
> >  > gr
> >  GRanges with 3 ranges and 0 metadata columns:
> >        seqnames               ranges strand
> >           <Rle>            <IRanges>  <Rle>
> >    [1]    chr14 [19069583, 19069654]      +
> >    [2]    chr14 [19363738, 19363809]      +
> >    [3]    chr14 [19363755, 19363826]      -
> >    [4]    chr14 [19369799, 19369870]      +
> >    ---
> >    seqinfo:
> >      seqlevels             seqlengths isCircular genome
> >      chr1                   249250621       <NA>   hg19
> >      chr10                  135534747       <NA>   hg19
> >      chr11                  135006516       <NA>   hg19
> >      ...                          ...        ...    ...
> >      chrUn_gl000249             38502       <NA>   hg19
> >      chrX                   155270560       <NA>   hg19
> >      chrY                    59373566       <NA>   hg19
> >
> > That way, we also raise awareness about the isCircular field.
> > The current choice to only display the seqlengths pre-dates the
> > existence of the seqinfo slot but might be a little bit misleading
> > those days since it only exposes some arbitrary seqinfo fields.
> >
> > H.
> >
> >>
> >> Cheers,
> >> Pete
> >>
> >>
> >>> I might have requested the genome annotation, but I'm pretty sure it
> wasn't
> >>> me who decide on tracking it on a per-sequence basis. I could imagine
> use
> >>> cases for that though, e.g., when diagnosing sequencing contamination
> (like
> >>> human vs. mouse). But most other tools and file formats expect a single
> >>> genome per "track", so, for example, rtracklayer has an internal
> function
> >>> singleGenome() to take care of this.
> >>>
> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <hpages at fhcrc.org> wrote:
> >>>
> >>>> Hi Vince,
> >>>>
> >>>> Yes it would make sense to have the "show" method report the genome
> >>>> when genome(x) contains a unique non-NA value. I think the main
> >>>> use case for having the genome defined at the sequence level instead
> >>>> of the whole object level is metagenomics. Maybe Michael has some
> other
> >>>> good use cases to share since IIRC he requested the addition of the
> >>>> genome field a couple of years ago and made the case for having it
> >>>> defined at the sequence level.
> >>>>
> >>>> Cheers,
> >>>> H.
> >>>>
> >>>>
> >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
> >>>>
> >>>>> For GRanges x, my naive expectation is that genome(x) returns a
> length-
> >>>>>
> >>>>> one tag identifying the genome to which chromosomal coordinates
> >>>>>
> >>>>> correspond.  The genome() method seems to have sequence-specific
> >>>>>
> >>>>> semantics, which makes sense, but when we identify sequence
> >>>>>
> >>>>> with chromosome, it seems too complicated.  Is there a use case for
> >>>>>
> >>>>> a GRanges with sequences from several different genomes?
> >>>>>
> >>>>>
> >>>>> One reason I am inquiring is that I feel it would be nice to have the
> >>>>> GRanges show() method report, prominently, the genome in use (or NA
> >>>>>
> >>>>> if unspecified).  This could be accomplished by reporting
> >>>>> unique(genome(x)), and perhaps that would be satisfactory.
> >>>>>
> >>>>> after example(genome) :
> >>>>>
> >>>>> seqinfo(txdb)
> >>>>>>
> >>>>>
> >>>>> Seqinfo of length 15
> >>>>>
> >>>>> seqnames seqlengths isCircular genome
> >>>>>
> >>>>> CH2L       23011544      FALSE    dm3
> >>>>>
> >>>>> CH2R       21146708      FALSE    dm3
> >>>>>
> >>>>> CH3L       24543557      FALSE    dm3
> >>>>>
> >>>>> CH3R       27905053      FALSE    dm3
> >>>>>
> >>>>> CH4         1351857      FALSE    dm3
> >>>>>
> >>>>> ...             ...        ...    ...
> >>>>>
> >>>>> CH3LHet     2555491      FALSE    dm3
> >>>>>
> >>>>> CH3RHet     2517507      FALSE    dm3
> >>>>>
> >>>>> CHXHet       204112      FALSE    dm3
> >>>>>
> >>>>> CHYHet       347038      FALSE    dm3
> >>>>>
> >>>>> CHUextra   29004656      FALSE    dm3
> >>>>>
> >>>>> genome(seqinfo(txdb))
> >>>>>>
> >>>>>
> >>>>>     CH2L     CH2R     CH3L     CH3R      CH4      CHX      CHU
>   M
> >>>>>
> >>>>>    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"
> "dm3"
> >>>>>
> >>>>>  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra
> >>>>>
> >>>>>    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"
> >>>>>
> >>>>>        [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>
> >>>>>
> >>>> --
> >>>> Herv? Pag?s
> >>>>
> >>>> Program in Computational Biology
> >>>> Division of Public Health Sciences
> >>>> Fred Hutchinson Cancer Research Center
> >>>> 1100 Fairview Ave. N, M1-B514
> >>>> P.O. Box 19024
> >>>> Seattle, WA 98109-1024
> >>>>
> >>>> E-mail: hpages at fhcrc.org
> >>>> Phone:  (206) 667-5791
> >>>> Fax:    (206) 667-1319
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>
> >> --------------------------------
> >> Peter Hickey,
> >> PhD Student/Research Assistant,
> >> Bioinformatics Division,
> >> Walter and Eliza Hall Institute of Medical Research,
> >> 1G Royal Parade, Parkville, Vic 3052, Australia.
> >> Ph: +613 9345 2324
> >>
> >> hickey at wehi.edu.au
> >> http://www.wehi.edu.au
> >>
> >> ______________________________________________________________________
> >> The information in this email is confidential and intend...{{dropped:6}}
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fhcrc.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
>
> --------------------------------
> Peter Hickey,
> PhD Student/Research Assistant,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Ph: +613 9345 2324
>
> hickey at wehi.edu.au
> http://www.wehi.edu.au
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:12}}



More information about the Bioc-devel mailing list