[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method
Hervé Pagès
hpages at fhcrc.org
Tue Sep 9 01:44:29 CEST 2014
On 09/08/2014 02:28 PM, Peter Hickey wrote:
> Just a vote for still allowing for multiple genomes in a Seqinfo object (in a GRanges object). My use case is in bisulfite-sequencing experiments where there is often a spike-in of a lambda phage genome along with the genome of interest (human or mouse). It's often useful to keep all data from a single library together in the same objet but process according to genome(x) for each seqlevel.
Note taken. Thanks Pete! It's always great to know about concrete use
cases.
>
> FWIW, I like Vincent's proposal of selectSome(unique(genome(x))) in the show method.
Or what about displaying the genome next to the seqlevel it's
associated with? Like e.g.:
> gr
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr14 [19069583, 19069654] +
[2] chr14 [19363738, 19363809] +
[3] chr14 [19363755, 19363826] -
[4] chr14 [19369799, 19369870] +
---
seqinfo:
seqlevels seqlengths isCircular genome
chr1 249250621 <NA> hg19
chr10 135534747 <NA> hg19
chr11 135006516 <NA> hg19
... ... ... ...
chrUn_gl000249 38502 <NA> hg19
chrX 155270560 <NA> hg19
chrY 59373566 <NA> hg19
That way, we also raise awareness about the isCircular field.
The current choice to only display the seqlengths pre-dates the
existence of the seqinfo slot but might be a little bit misleading
those days since it only exposes some arbitrary seqinfo fields.
H.
>
> Cheers,
> Pete
>
>
>> I might have requested the genome annotation, but I'm pretty sure it wasn't
>> me who decide on tracking it on a per-sequence basis. I could imagine use
>> cases for that though, e.g., when diagnosing sequencing contamination (like
>> human vs. mouse). But most other tools and file formats expect a single
>> genome per "track", so, for example, rtracklayer has an internal function
>> singleGenome() to take care of this.
>>
>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <hpages at fhcrc.org> wrote:
>>
>>> Hi Vince,
>>>
>>> Yes it would make sense to have the "show" method report the genome
>>> when genome(x) contains a unique non-NA value. I think the main
>>> use case for having the genome defined at the sequence level instead
>>> of the whole object level is metagenomics. Maybe Michael has some other
>>> good use cases to share since IIRC he requested the addition of the
>>> genome field a couple of years ago and made the case for having it
>>> defined at the sequence level.
>>>
>>> Cheers,
>>> H.
>>>
>>>
>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>>>
>>>> For GRanges x, my naive expectation is that genome(x) returns a length-
>>>>
>>>> one tag identifying the genome to which chromosomal coordinates
>>>>
>>>> correspond. The genome() method seems to have sequence-specific
>>>>
>>>> semantics, which makes sense, but when we identify sequence
>>>>
>>>> with chromosome, it seems too complicated. Is there a use case for
>>>>
>>>> a GRanges with sequences from several different genomes?
>>>>
>>>>
>>>> One reason I am inquiring is that I feel it would be nice to have the
>>>> GRanges show() method report, prominently, the genome in use (or NA
>>>>
>>>> if unspecified). This could be accomplished by reporting
>>>> unique(genome(x)), and perhaps that would be satisfactory.
>>>>
>>>> after example(genome) :
>>>>
>>>> seqinfo(txdb)
>>>>>
>>>>
>>>> Seqinfo of length 15
>>>>
>>>> seqnames seqlengths isCircular genome
>>>>
>>>> CH2L 23011544 FALSE dm3
>>>>
>>>> CH2R 21146708 FALSE dm3
>>>>
>>>> CH3L 24543557 FALSE dm3
>>>>
>>>> CH3R 27905053 FALSE dm3
>>>>
>>>> CH4 1351857 FALSE dm3
>>>>
>>>> ... ... ... ...
>>>>
>>>> CH3LHet 2555491 FALSE dm3
>>>>
>>>> CH3RHet 2517507 FALSE dm3
>>>>
>>>> CHXHet 204112 FALSE dm3
>>>>
>>>> CHYHet 347038 FALSE dm3
>>>>
>>>> CHUextra 29004656 FALSE dm3
>>>>
>>>> genome(seqinfo(txdb))
>>>>>
>>>>
>>>> CH2L CH2R CH3L CH3R CH4 CHX CHU M
>>>>
>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>>>>
>>>> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet CHUextra
>>>>
>>>> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>> --
>>> Herv? Pag?s
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fhcrc.org
>>> Phone: (206) 667-5791
>>> Fax: (206) 667-1319
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>
> --------------------------------
> Peter Hickey,
> PhD Student/Research Assistant,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Ph: +613 9345 2324
>
> hickey at wehi.edu.au
> http://www.wehi.edu.au
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:21}}
More information about the Bioc-devel
mailing list