[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method
Vincent Carey
stvjc at channing.harvard.edu
Tue Sep 9 07:08:23 CEST 2014
On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> On 09/08/2014 06:42 PM, Michael Lawrence wrote:
>
>> Instead of printing out multiple lines of a table that is rarely of
>> interest, could we develop Peter's idea toward something like:
>>
>> hg19:chr1 hg19:chr2 ...
>> [lengths ...]
>>
>> Not sure what condensed notation would be useful for circularity.
>>
>
> I don't know either. I'm worried that this would make the seqinfo
> stuff look like a named vector and that the user would expect
> hg19:chr1, hg19:chr2, etc... to be valid names.
>
> With the table-like layout, some screen real estate can always be
> saved by printing less lines:
>
>
What I had in mind was
> > gr
> GRanges with 3 ranges and 0 metadata columns:
>
genome: hg19
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr14 [19069583, 19069654] +
> [2] chr14 [19363738, 19363809] +
> [3] chr14 [19363755, 19363826] -
> [4] chr14 [19369799, 19369870] +
>
you could then probably dispense with the seqlengths. i have
never found them too useful except as a key to the genome.
if there are multiple genomes, we have something like
genomes: hg19, mm9
the point is to make it prominent, particularly at a time of transition.
> --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
> seqlevels seqlengths isCircular genome
> chr1 249250621 <NA> hg19
> chr10 135534747 <NA> hg19
> ... ... ... ...
> chrX 155270560 <NA> hg19
> chrY 59373566 <NA> hg19
>
> I agree that the exact content of the seqinfo table itself is rarely
> of interest so printing only 3 or 4 lines is OK. IMO it's important
> to make the user aware of the existence of this hidden table and to
> display it like what it really is (i.e. a table). Also displaying the
> column names is a well established tradition and serves the purpose
> of providing a quick summary of the accessors that are available to
> access those fields.
>
> H.
>
>
>>
>> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hickey at wehi.edu.au
>> <mailto:hickey at wehi.edu.au>> wrote:
>>
>> Perhaps it might be useful to have some way of highlighting if any
>> of the chromosomes are circular or highlighting if there are
>> multiple genomes present? Otherwise this information might be hidden
>> in the "…"
>>
>> Cheers,
>> Pete
>>
>>
>> On 09/09/2014, at 9:44 AM, Hervé Pagès <hpages at fhcrc.org
>> <mailto:hpages at fhcrc.org>> wrote:
>>
>> > On 09/08/2014 02:28 PM, Peter Hickey wrote:
>> >> Just a vote for still allowing for multiple genomes in a Seqinfo
>> object (in a GRanges object). My use case is in bisulfite-sequencing
>> experiments where there is often a spike-in of a lambda phage genome
>> along with the genome of interest (human or mouse). It's often
>> useful to keep all data from a single library together in the same
>> objet but process according to genome(x) for each seqlevel.
>> >
>> > Note taken. Thanks Pete! It's always great to know about concrete
>> use
>> > cases.
>> >
>> >>
>> >> FWIW, I like Vincent's proposal of selectSome(unique(genome(x)))
>> in the show method.
>> >
>> > Or what about displaying the genome next to the seqlevel it's
>> > associated with? Like e.g.:
>> >
>> > > gr
>> > GRanges with 3 ranges and 0 metadata columns:
>> > seqnames ranges strand
>> > <Rle> <IRanges> <Rle>
>> > [1] chr14 [19069583, 19069654] +
>> > [2] chr14 [19363738, 19363809] +
>> > [3] chr14 [19363755, 19363826] -
>> > [4] chr14 [19369799, 19369870] +
>> > ---
>> > seqinfo:
>> > seqlevels seqlengths isCircular genome
>> > chr1 249250621 <NA> hg19
>> > chr10 135534747 <NA> hg19
>> > chr11 135006516 <NA> hg19
>> > ... ... ... ...
>> > chrUn_gl000249 38502 <NA> hg19
>> > chrX 155270560 <NA> hg19
>> > chrY 59373566 <NA> hg19
>> >
>> > That way, we also raise awareness about the isCircular field.
>> > The current choice to only display the seqlengths pre-dates the
>> > existence of the seqinfo slot but might be a little bit misleading
>> > those days since it only exposes some arbitrary seqinfo fields.
>> >
>> > H.
>> >
>> >>
>> >> Cheers,
>> >> Pete
>> >>
>> >>
>> >>> I might have requested the genome annotation, but I'm pretty
>> sure it wasn't
>> >>> me who decide on tracking it on a per-sequence basis. I could
>> imagine use
>> >>> cases for that though, e.g., when diagnosing sequencing
>> contamination (like
>> >>> human vs. mouse). But most other tools and file formats expect
>> a single
>> >>> genome per "track", so, for example, rtracklayer has an
>> internal function
>> >>> singleGenome() to take care of this.
>> >>>
>> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <hpages at fhcrc.org
>> <mailto:hpages at fhcrc.org>> wrote:
>> >>>
>> >>>> Hi Vince,
>> >>>>
>> >>>> Yes it would make sense to have the "show" method report the
>> genome
>> >>>> when genome(x) contains a unique non-NA value. I think the main
>> >>>> use case for having the genome defined at the sequence level
>> instead
>> >>>> of the whole object level is metagenomics. Maybe Michael has
>> some other
>> >>>> good use cases to share since IIRC he requested the addition
>> of the
>> >>>> genome field a couple of years ago and made the case for having
>> it
>> >>>> defined at the sequence level.
>> >>>>
>> >>>> Cheers,
>> >>>> H.
>> >>>>
>> >>>>
>> >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>> >>>>
>> >>>>> For GRanges x, my naive expectation is that genome(x) returns
>> a length-
>> >>>>>
>> >>>>> one tag identifying the genome to which chromosomal coordinates
>> >>>>>
>> >>>>> correspond. The genome() method seems to have
>> sequence-specific
>> >>>>>
>> >>>>> semantics, which makes sense, but when we identify sequence
>> >>>>>
>> >>>>> with chromosome, it seems too complicated. Is there a use
>> case for
>> >>>>>
>> >>>>> a GRanges with sequences from several different genomes?
>> >>>>>
>> >>>>>
>> >>>>> One reason I am inquiring is that I feel it would be nice to
>> have the
>> >>>>> GRanges show() method report, prominently, the genome in use
>> (or NA
>> >>>>>
>> >>>>> if unspecified). This could be accomplished by reporting
>> >>>>> unique(genome(x)), and perhaps that would be satisfactory.
>> >>>>>
>> >>>>> after example(genome) :
>> >>>>>
>> >>>>> seqinfo(txdb)
>> >>>>>>
>> >>>>>
>> >>>>> Seqinfo of length 15
>> >>>>>
>> >>>>> seqnames seqlengths isCircular genome
>> >>>>>
>> >>>>> CH2L 23011544 FALSE dm3
>> >>>>>
>> >>>>> CH2R 21146708 FALSE dm3
>> >>>>>
>> >>>>> CH3L 24543557 FALSE dm3
>> >>>>>
>> >>>>> CH3R 27905053 FALSE dm3
>> >>>>>
>> >>>>> CH4 1351857 FALSE dm3
>> >>>>>
>> >>>>> ... ... ... ...
>> >>>>>
>> >>>>> CH3LHet 2555491 FALSE dm3
>> >>>>>
>> >>>>> CH3RHet 2517507 FALSE dm3
>> >>>>>
>> >>>>> CHXHet 204112 FALSE dm3
>> >>>>>
>> >>>>> CHYHet 347038 FALSE dm3
>> >>>>>
>> >>>>> CHUextra 29004656 FALSE dm3
>> >>>>>
>> >>>>> genome(seqinfo(txdb))
>> >>>>>>
>> >>>>>
>> >>>>> CH2L CH2R CH3L CH3R CH4 CHX
>> CHU M
>> >>>>>
>> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>> "dm3" "dm3"
>> >>>>>
>> >>>>> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet CHUextra
>> >>>>>
>> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>> >>>>>
>> >>>>> [[alternative HTML version deleted]]
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> mailing list
>> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>>>>
>> >>>>>
>> >>>> --
>> >>>> Herv? Pag?s
>> >>>>
>> >>>> Program in Computational Biology
>> >>>> Division of Public Health Sciences
>> >>>> Fred Hutchinson Cancer Research Center
>> >>>> 1100 Fairview Ave. N, M1-B514
>> >>>> P.O. Box 19024
>> >>>> Seattle, WA 98109-1024
>> >>>>
>> >>>> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>> >>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> >>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>> mailing list
>> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>>>
>> >>
>> >> --------------------------------
>> >> Peter Hickey,
>> >> PhD Student/Research Assistant,
>> >> Bioinformatics Division,
>> >> Walter and Eliza Hall Institute of Medical Research,
>> >> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> >> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
>> >>
>> >> hickey at wehi.edu.au <mailto:hickey at wehi.edu.au>
>> >> http://www.wehi.edu.au
>> >>
>> >>
>> ____________________________________________________________
>> __________
>> >> The information in this email is confidential and
>> intend...{{dropped:6}}
>> >>
>> >> _______________________________________________
>> >>Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>> list
>> >>https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >
>> > --
>> > Hervé Pagès
>> >
>> > Program in Computational Biology
>> > Division of Public Health Sciences
>> > Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N, M1-B514
>> > P.O. Box 19024
>> > Seattle, WA 98109-1024
>> >
>> > E-mail:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>> > Phone:(206) 667-5791 <tel:%28206%29%20667-5791>
>> > Fax:(206) 667-1319 <tel:%28206%29%20667-1319>
>>
>> --------------------------------
>> Peter Hickey,
>> PhD Student/Research Assistant,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
>>
>> hickey at wehi.edu.au <mailto:hickey at wehi.edu.au>
>> http://www.wehi.edu.au
>>
>>
>> ____________________________________________________________
>> __________
>> The information in this email is confidential and
>> intend...{{dropped:8}}
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>> list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list