[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method
Hervé Pagès
hpages at fhcrc.org
Tue Sep 9 06:30:56 CEST 2014
On 09/08/2014 06:42 PM, Michael Lawrence wrote:
> Instead of printing out multiple lines of a table that is rarely of
> interest, could we develop Peter's idea toward something like:
>
> hg19:chr1 hg19:chr2 ...
> [lengths ...]
>
> Not sure what condensed notation would be useful for circularity.
I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.
With the table-like layout, some screen real estate can always be
saved by printing less lines:
> gr
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr14 [19069583, 19069654] +
[2] chr14 [19363738, 19363809] +
[3] chr14 [19363755, 19363826] -
[4] chr14 [19369799, 19369870] +
--- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
seqlevels seqlengths isCircular genome
chr1 249250621 <NA> hg19
chr10 135534747 <NA> hg19
... ... ... ...
chrX 155270560 <NA> hg19
chrY 59373566 <NA> hg19
I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.
H.
>
>
> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hickey at wehi.edu.au
> <mailto:hickey at wehi.edu.au>> wrote:
>
> Perhaps it might be useful to have some way of highlighting if any
> of the chromosomes are circular or highlighting if there are
> multiple genomes present? Otherwise this information might be hidden
> in the "…"
>
> Cheers,
> Pete
>
>
> On 09/09/2014, at 9:44 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> > On 09/08/2014 02:28 PM, Peter Hickey wrote:
> >> Just a vote for still allowing for multiple genomes in a Seqinfo
> object (in a GRanges object). My use case is in bisulfite-sequencing
> experiments where there is often a spike-in of a lambda phage genome
> along with the genome of interest (human or mouse). It's often
> useful to keep all data from a single library together in the same
> objet but process according to genome(x) for each seqlevel.
> >
> > Note taken. Thanks Pete! It's always great to know about concrete use
> > cases.
> >
> >>
> >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
> in the show method.
> >
> > Or what about displaying the genome next to the seqlevel it's
> > associated with? Like e.g.:
> >
> > > gr
> > GRanges with 3 ranges and 0 metadata columns:
> > seqnames ranges strand
> > <Rle> <IRanges> <Rle>
> > [1] chr14 [19069583, 19069654] +
> > [2] chr14 [19363738, 19363809] +
> > [3] chr14 [19363755, 19363826] -
> > [4] chr14 [19369799, 19369870] +
> > ---
> > seqinfo:
> > seqlevels seqlengths isCircular genome
> > chr1 249250621 <NA> hg19
> > chr10 135534747 <NA> hg19
> > chr11 135006516 <NA> hg19
> > ... ... ... ...
> > chrUn_gl000249 38502 <NA> hg19
> > chrX 155270560 <NA> hg19
> > chrY 59373566 <NA> hg19
> >
> > That way, we also raise awareness about the isCircular field.
> > The current choice to only display the seqlengths pre-dates the
> > existence of the seqinfo slot but might be a little bit misleading
> > those days since it only exposes some arbitrary seqinfo fields.
> >
> > H.
> >
> >>
> >> Cheers,
> >> Pete
> >>
> >>
> >>> I might have requested the genome annotation, but I'm pretty
> sure it wasn't
> >>> me who decide on tracking it on a per-sequence basis. I could
> imagine use
> >>> cases for that though, e.g., when diagnosing sequencing
> contamination (like
> >>> human vs. mouse). But most other tools and file formats expect
> a single
> >>> genome per "track", so, for example, rtracklayer has an
> internal function
> >>> singleGenome() to take care of this.
> >>>
> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
> >>>
> >>>> Hi Vince,
> >>>>
> >>>> Yes it would make sense to have the "show" method report the
> genome
> >>>> when genome(x) contains a unique non-NA value. I think the main
> >>>> use case for having the genome defined at the sequence level
> instead
> >>>> of the whole object level is metagenomics. Maybe Michael has
> some other
> >>>> good use cases to share since IIRC he requested the addition
> of the
> >>>> genome field a couple of years ago and made the case for having it
> >>>> defined at the sequence level.
> >>>>
> >>>> Cheers,
> >>>> H.
> >>>>
> >>>>
> >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
> >>>>
> >>>>> For GRanges x, my naive expectation is that genome(x) returns
> a length-
> >>>>>
> >>>>> one tag identifying the genome to which chromosomal coordinates
> >>>>>
> >>>>> correspond. The genome() method seems to have sequence-specific
> >>>>>
> >>>>> semantics, which makes sense, but when we identify sequence
> >>>>>
> >>>>> with chromosome, it seems too complicated. Is there a use
> case for
> >>>>>
> >>>>> a GRanges with sequences from several different genomes?
> >>>>>
> >>>>>
> >>>>> One reason I am inquiring is that I feel it would be nice to
> have the
> >>>>> GRanges show() method report, prominently, the genome in use
> (or NA
> >>>>>
> >>>>> if unspecified). This could be accomplished by reporting
> >>>>> unique(genome(x)), and perhaps that would be satisfactory.
> >>>>>
> >>>>> after example(genome) :
> >>>>>
> >>>>> seqinfo(txdb)
> >>>>>>
> >>>>>
> >>>>> Seqinfo of length 15
> >>>>>
> >>>>> seqnames seqlengths isCircular genome
> >>>>>
> >>>>> CH2L 23011544 FALSE dm3
> >>>>>
> >>>>> CH2R 21146708 FALSE dm3
> >>>>>
> >>>>> CH3L 24543557 FALSE dm3
> >>>>>
> >>>>> CH3R 27905053 FALSE dm3
> >>>>>
> >>>>> CH4 1351857 FALSE dm3
> >>>>>
> >>>>> ... ... ... ...
> >>>>>
> >>>>> CH3LHet 2555491 FALSE dm3
> >>>>>
> >>>>> CH3RHet 2517507 FALSE dm3
> >>>>>
> >>>>> CHXHet 204112 FALSE dm3
> >>>>>
> >>>>> CHYHet 347038 FALSE dm3
> >>>>>
> >>>>> CHUextra 29004656 FALSE dm3
> >>>>>
> >>>>> genome(seqinfo(txdb))
> >>>>>>
> >>>>>
> >>>>> CH2L CH2R CH3L CH3R CH4 CHX
> CHU M
> >>>>>
> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
> "dm3" "dm3"
> >>>>>
> >>>>> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet CHUextra
> >>>>>
> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
> >>>>>
> >>>>> [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>
> >>>>>
> >>>> --
> >>>> Herv? Pag?s
> >>>>
> >>>> Program in Computational Biology
> >>>> Division of Public Health Sciences
> >>>> Fred Hutchinson Cancer Research Center
> >>>> 1100 Fairview Ave. N, M1-B514
> >>>> P.O. Box 19024
> >>>> Seattle, WA 98109-1024
> >>>>
> >>>> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> >>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> >>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>
> >> --------------------------------
> >> Peter Hickey,
> >> PhD Student/Research Assistant,
> >> Bioinformatics Division,
> >> Walter and Eliza Hall Institute of Medical Research,
> >> 1G Royal Parade, Parkville, Vic 3052, Australia.
> >> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
> >>
> >> hickey at wehi.edu.au <mailto:hickey at wehi.edu.au>
> >> http://www.wehi.edu.au
> >>
> >>
> ______________________________________________________________________
> >> The information in this email is confidential and
> intend...{{dropped:6}}
> >>
> >> _______________________________________________
> >>Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> >>https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> > Phone:(206) 667-5791 <tel:%28206%29%20667-5791>
> > Fax:(206) 667-1319 <tel:%28206%29%20667-1319>
>
> --------------------------------
> Peter Hickey,
> PhD Student/Research Assistant,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
>
> hickey at wehi.edu.au <mailto:hickey at wehi.edu.au>
> http://www.wehi.edu.au
>
>
> ______________________________________________________________________
> The information in this email is confidential and intend...{{dropped:8}}
>
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list