[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method
Hervé Pagès
hpages at fhcrc.org
Tue Sep 9 20:48:26 CEST 2014
OK so let's go for a 1 line summarization of the seqinfo.
Vince is that OK if we keep this at the bottom of the object?
That way it will always be visible, even when the object requires
more than 1 screen to display (e.g. when it has a lot of metadata
cols). Will look something like:
> gr
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr14 [19069583, 19069654] +
[2] chr14 [19363738, 19363809] +
[3] chr14 [19363755, 19363826] -
[4] chr14 [19369799, 19369870] +
seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); no
seqlengths
Thanks,
H.
On 09/09/2014 06:38 AM, Michael Lawrence wrote:
> Agreed, that looks a lot nicer.
>
> On Tue, Sep 9, 2014 at 4:42 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>
>> On 09/09/2014 04:02 AM, Michael Lawrence wrote:
>>
>>> I'm in favor of this display. The seqinfo output at the bottom has always
>>> been annoying (over-emphasized).
>>>
>>
>> the fact that the lengths are 'NA' can be a helpful prompt to do something
>> about it, e.g., add seqinfo when inputting the data. Also they are helpful
>> when one is told that seqlengths are incompatible during, e.g.,
>> findOverlaps. But I like the idea of less but more informative display of
>> seqinfo, along the lines suggested by Vince.
>>
>> seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA'
>> seqlengths
>>
>> Martin
>>
>>
>>> On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey <
>>> stvjc at channing.harvard.edu>
>>> wrote:
>>>
>>>
>>>>
>>>> On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
>>>>
>>>> On 09/08/2014 06:42 PM, Michael Lawrence wrote:
>>>>>
>>>>> Instead of printing out multiple lines of a table that is rarely of
>>>>>> interest, could we develop Peter's idea toward something like:
>>>>>>
>>>>>> hg19:chr1 hg19:chr2 ...
>>>>>> [lengths ...]
>>>>>>
>>>>>> Not sure what condensed notation would be useful for circularity.
>>>>>>
>>>>>>
>>>>> I don't know either. I'm worried that this would make the seqinfo
>>>>> stuff look like a named vector and that the user would expect
>>>>> hg19:chr1, hg19:chr2, etc... to be valid names.
>>>>>
>>>>> With the table-like layout, some screen real estate can always be
>>>>> saved by printing less lines:
>>>>>
>>>>>
>>>>> What I had in mind was
>>>>
>>>>
>>>> > gr
>>>>> GRanges with 3 ranges and 0 metadata columns:
>>>>>
>>>>> genome: hg19
>>>>
>>>> seqnames ranges strand
>>>>> <Rle> <IRanges> <Rle>
>>>>> [1] chr14 [19069583, 19069654] +
>>>>> [2] chr14 [19363738, 19363809] +
>>>>> [3] chr14 [19363755, 19363826] -
>>>>> [4] chr14 [19369799, 19369870] +
>>>>>
>>>>>
>>>>
>>>> you could then probably dispense with the seqlengths. i have
>>>> never found them too useful except as a key to the genome.
>>>>
>>>> if there are multiple genomes, we have something like
>>>>
>>>> genomes: hg19, mm9
>>>>
>>>> the point is to make it prominent, particularly at a time of transition.
>>>>
>>>>
>>>>
>>>> --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---
>>>>> seqlevels seqlengths isCircular
>>>>> genome
>>>>> chr1 249250621 <NA>
>>>>> hg19
>>>>> chr10 135534747 <NA>
>>>>> hg19
>>>>> ... ... ...
>>>>> ...
>>>>> chrX 155270560 <NA>
>>>>> hg19
>>>>> chrY 59373566 <NA>
>>>>> hg19
>>>>>
>>>>> I agree that the exact content of the seqinfo table itself is rarely
>>>>> of interest so printing only 3 or 4 lines is OK. IMO it's important
>>>>> to make the user aware of the existence of this hidden table and to
>>>>> display it like what it really is (i.e. a table). Also displaying the
>>>>> column names is a well established tradition and serves the purpose
>>>>> of providing a quick summary of the accessors that are available to
>>>>> access those fields.
>>>>>
>>>>> H.
>>>>>
>>>>>
>>>>>
>>>>>> On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hickey at wehi.edu.au
>>>>>> <mailto:hickey at wehi.edu.au>> wrote:
>>>>>>
>>>>>> Perhaps it might be useful to have some way of highlighting if any
>>>>>> of the chromosomes are circular or highlighting if there are
>>>>>> multiple genomes present? Otherwise this information might be
>>>>>> hidden
>>>>>> in the "…"
>>>>>>
>>>>>> Cheers,
>>>>>> Pete
>>>>>>
>>>>>>
>>>>>> On 09/09/2014, at 9:44 AM, Hervé Pagès <hpages at fhcrc.org
>>>>>> <mailto:hpages at fhcrc.org>> wrote:
>>>>>>
>>>>>> > On 09/08/2014 02:28 PM, Peter Hickey wrote:
>>>>>> >> Just a vote for still allowing for multiple genomes in a
>>>>>> Seqinfo
>>>>>> object (in a GRanges object). My use case is in
>>>>>> bisulfite-sequencing
>>>>>> experiments where there is often a spike-in of a lambda phage
>>>>>> genome
>>>>>> along with the genome of interest (human or mouse). It's often
>>>>>> useful to keep all data from a single library together in the same
>>>>>> objet but process according to genome(x) for each seqlevel.
>>>>>> >
>>>>>> > Note taken. Thanks Pete! It's always great to know about
>>>>>> concrete
>>>>>> use
>>>>>> > cases.
>>>>>> >
>>>>>> >>
>>>>>> >> FWIW, I like Vincent's proposal of
>>>>>> selectSome(unique(genome(x)))
>>>>>> in the show method.
>>>>>> >
>>>>>> > Or what about displaying the genome next to the seqlevel it's
>>>>>> > associated with? Like e.g.:
>>>>>> >
>>>>>> > > gr
>>>>>> > GRanges with 3 ranges and 0 metadata columns:
>>>>>> > seqnames ranges strand
>>>>>> > <Rle> <IRanges> <Rle>
>>>>>> > [1] chr14 [19069583, 19069654] +
>>>>>> > [2] chr14 [19363738, 19363809] +
>>>>>> > [3] chr14 [19363755, 19363826] -
>>>>>> > [4] chr14 [19369799, 19369870] +
>>>>>> > ---
>>>>>> > seqinfo:
>>>>>> > seqlevels seqlengths isCircular genome
>>>>>> > chr1 249250621 <NA> hg19
>>>>>> > chr10 135534747 <NA> hg19
>>>>>> > chr11 135006516 <NA> hg19
>>>>>> > ... ... ... ...
>>>>>> > chrUn_gl000249 38502 <NA> hg19
>>>>>> > chrX 155270560 <NA> hg19
>>>>>> > chrY 59373566 <NA> hg19
>>>>>> >
>>>>>> > That way, we also raise awareness about the isCircular field.
>>>>>> > The current choice to only display the seqlengths pre-dates the
>>>>>> > existence of the seqinfo slot but might be a little bit
>>>>>> misleading
>>>>>> > those days since it only exposes some arbitrary seqinfo fields.
>>>>>> >
>>>>>> > H.
>>>>>> >
>>>>>> >>
>>>>>> >> Cheers,
>>>>>> >> Pete
>>>>>> >>
>>>>>> >>
>>>>>> >>> I might have requested the genome annotation, but I'm pretty
>>>>>> sure it wasn't
>>>>>> >>> me who decide on tracking it on a per-sequence basis. I could
>>>>>> imagine use
>>>>>> >>> cases for that though, e.g., when diagnosing sequencing
>>>>>> contamination (like
>>>>>> >>> human vs. mouse). But most other tools and file formats
>>>>>> expect
>>>>>> a single
>>>>>> >>> genome per "track", so, for example, rtracklayer has an
>>>>>> internal function
>>>>>> >>> singleGenome() to take care of this.
>>>>>> >>>
>>>>>> >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <
>>>>>> hpages at fhcrc.org
>>>>>> <mailto:hpages at fhcrc.org>> wrote:
>>>>>> >>>
>>>>>> >>>> Hi Vince,
>>>>>> >>>>
>>>>>> >>>> Yes it would make sense to have the "show" method report the
>>>>>> genome
>>>>>> >>>> when genome(x) contains a unique non-NA value. I think the
>>>>>> main
>>>>>> >>>> use case for having the genome defined at the sequence level
>>>>>> instead
>>>>>> >>>> of the whole object level is metagenomics. Maybe Michael has
>>>>>> some other
>>>>>> >>>> good use cases to share since IIRC he requested the addition
>>>>>> of the
>>>>>> >>>> genome field a couple of years ago and made the case for
>>>>>> having it
>>>>>> >>>> defined at the sequence level.
>>>>>> >>>>
>>>>>> >>>> Cheers,
>>>>>> >>>> H.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>>>>>> >>>>
>>>>>> >>>>> For GRanges x, my naive expectation is that genome(x)
>>>>>> returns
>>>>>> a length-
>>>>>> >>>>>
>>>>>> >>>>> one tag identifying the genome to which chromosomal
>>>>>> coordinates
>>>>>> >>>>>
>>>>>> >>>>> correspond. The genome() method seems to have
>>>>>> sequence-specific
>>>>>> >>>>>
>>>>>> >>>>> semantics, which makes sense, but when we identify sequence
>>>>>> >>>>>
>>>>>> >>>>> with chromosome, it seems too complicated. Is there a use
>>>>>> case for
>>>>>> >>>>>
>>>>>> >>>>> a GRanges with sequences from several different genomes?
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> One reason I am inquiring is that I feel it would be nice
>>>>>> to
>>>>>> have the
>>>>>> >>>>> GRanges show() method report, prominently, the genome in
>>>>>> use
>>>>>> (or NA
>>>>>> >>>>>
>>>>>> >>>>> if unspecified). This could be accomplished by reporting
>>>>>> >>>>> unique(genome(x)), and perhaps that would be satisfactory.
>>>>>> >>>>>
>>>>>> >>>>> after example(genome) :
>>>>>> >>>>>
>>>>>> >>>>> seqinfo(txdb)
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>> Seqinfo of length 15
>>>>>> >>>>>
>>>>>> >>>>> seqnames seqlengths isCircular genome
>>>>>> >>>>>
>>>>>> >>>>> CH2L 23011544 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CH2R 21146708 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CH3L 24543557 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CH3R 27905053 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CH4 1351857 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> ... ... ... ...
>>>>>> >>>>>
>>>>>> >>>>> CH3LHet 2555491 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CH3RHet 2517507 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CHXHet 204112 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CHYHet 347038 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> CHUextra 29004656 FALSE dm3
>>>>>> >>>>>
>>>>>> >>>>> genome(seqinfo(txdb))
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>> CH2L CH2R CH3L CH3R CH4 CHX
>>>>>> CHU M
>>>>>> >>>>>
>>>>>> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>>>>>> "dm3" "dm3"
>>>>>> >>>>>
>>>>>> >>>>> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet
>>>>>> CHUextra
>>>>>> >>>>>
>>>>>> >>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>>>>>> "dm3"
>>>>>> >>>>>
>>>>>> >>>>> [[alternative HTML version deleted]]
>>>>>> >>>>>
>>>>>> >>>>> _______________________________________________
>>>>>> >>>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>>>> mailing list
>>>>>> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>> --
>>>>>> >>>> Herv? Pag?s
>>>>>> >>>>
>>>>>> >>>> Program in Computational Biology
>>>>>> >>>> Division of Public Health Sciences
>>>>>> >>>> Fred Hutchinson Cancer Research Center
>>>>>> >>>> 1100 Fairview Ave. N, M1-B514
>>>>>> >>>> P.O. Box 19024
>>>>>> >>>> Seattle, WA 98109-1024
>>>>>> >>>>
>>>>>> >>>> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>>>>>> >>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>>>>> >>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> _______________________________________________
>>>>>> >>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>>>> mailing list
>>>>>> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>> >>>>
>>>>>> >>
>>>>>> >> --------------------------------
>>>>>> >> Peter Hickey,
>>>>>> >> PhD Student/Research Assistant,
>>>>>> >> Bioinformatics Division,
>>>>>> >> Walter and Eliza Hall Institute of Medical Research,
>>>>>> >> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>>>>> >> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
>>>>>> >>
>>>>>> >> hickey at wehi.edu.au <mailto:hickey at wehi.edu.au>
>>>>>> >> http://www.wehi.edu.au
>>>>>> >>
>>>>>> >>
>>>>>> ____________________________________________________________
>>>>>> __________
>>>>>> >> The information in this email is confidential and
>>>>>> intend...{{dropped:6}}
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >>Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>>>> mailing list
>>>>>> >>https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>> >>
>>>>>> >
>>>>>> > --
>>>>>> > Hervé Pagès
>>>>>> >
>>>>>> > Program in Computational Biology
>>>>>> > Division of Public Health Sciences
>>>>>> > Fred Hutchinson Cancer Research Center
>>>>>> > 1100 Fairview Ave. N, M1-B514
>>>>>> > P.O. Box 19024
>>>>>> > Seattle, WA 98109-1024
>>>>>> >
>>>>>> > E-mail:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>>>>>> > Phone:(206) 667-5791 <tel:%28206%29%20667-5791>
>>>>>> > Fax:(206) 667-1319 <tel:%28206%29%20667-1319>
>>>>>>
>>>>>> --------------------------------
>>>>>> Peter Hickey,
>>>>>> PhD Student/Research Assistant,
>>>>>> Bioinformatics Division,
>>>>>> Walter and Eliza Hall Institute of Medical Research,
>>>>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>>>>> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
>>>>>>
>>>>>> hickey at wehi.edu.au <mailto:hickey at wehi.edu.au>
>>>>>> http://www.wehi.edu.au
>>>>>>
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> __________
>>>>>> The information in this email is confidential and
>>>>>> intend...{{dropped:8}}
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>>>> mailing
>>>>>> list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>> Hervé Pagès
>>>>>
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>>
>>>>> E-mail: hpages at fhcrc.org
>>>>> Phone: (206) 667-5791
>>>>> Fax: (206) 667-1319
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list