[Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method
Hervé Pagès
hpages at fhcrc.org
Mon Sep 8 22:00:53 CEST 2014
On 09/08/2014 11:41 AM, Michael Lawrence wrote:
> I might have requested the genome annotation, but I'm pretty sure it
> wasn't me who decide on tracking it on a per-sequence basis.
OK, maybe. Don't trust my memory too much on this. No regrets though.
I think it was the right thing to do ;-) Just because the SAM/BAM
format does it is a good enough reason for us to do it too. According
to the SAM Spec, the header can look like:
@HD VN:1.3 SO:coordinate
@SQ SN:chr1_hg19 LN:45 AS:hg19
@SQ SN:chr1_mm10 LN:42 AS:mm10
The only problem is that seqinfo(BamFile("test.bam")) seems to ignore
the AS tag (genome assembly identifier) at the moment:
> seqinfo(BamFile("test.bam"))
Seqinfo of length 2
seqnames seqlengths isCircular genome
chr1_hg19 45 NA <NA>
chr1_mm10 42 NA <NA>
Hopefully that can be addressed. But that's a separate issue...
Cheers,
H.
> I could
> imagine use cases for that though, e.g., when diagnosing sequencing
> contamination (like human vs. mouse). But most other tools and file
> formats expect a single genome per "track", so, for example, rtracklayer
> has an internal function singleGenome() to take care of this.
>
> On Mon, Sep 8, 2014 at 10:50 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Vince,
>
> Yes it would make sense to have the "show" method report the genome
> when genome(x) contains a unique non-NA value. I think the main
> use case for having the genome defined at the sequence level instead
> of the whole object level is metagenomics. Maybe Michael has some other
> good use cases to share since IIRC he requested the addition of the
> genome field a couple of years ago and made the case for having it
> defined at the sequence level.
>
> Cheers,
> H.
>
>
> On 09/08/2014 07:21 AM, Vincent Carey wrote:
>
> For GRanges x, my naive expectation is that genome(x) returns a
> length-
>
> one tag identifying the genome to which chromosomal coordinates
>
> correspond. The genome() method seems to have sequence-specific
>
> semantics, which makes sense, but when we identify sequence
>
> with chromosome, it seems too complicated. Is there a use case for
>
> a GRanges with sequences from several different genomes?
>
>
> One reason I am inquiring is that I feel it would be nice to
> have the
> GRanges show() method report, prominently, the genome in use (or NA
>
> if unspecified). This could be accomplished by reporting
> unique(genome(x)), and perhaps that would be satisfactory.
>
> after example(genome) :
>
> seqinfo(txdb)
>
>
> Seqinfo of length 15
>
> seqnames seqlengths isCircular genome
>
> CH2L 23011544 FALSE dm3
>
> CH2R 21146708 FALSE dm3
>
> CH3L 24543557 FALSE dm3
>
> CH3R 27905053 FALSE dm3
>
> CH4 1351857 FALSE dm3
>
> ... ... ... ...
>
> CH3LHet 2555491 FALSE dm3
>
> CH3RHet 2517507 FALSE dm3
>
> CHXHet 204112 FALSE dm3
>
> CHYHet 347038 FALSE dm3
>
> CHUextra 29004656 FALSE dm3
>
> genome(seqinfo(txdb))
>
>
> CH2L CH2R CH3L CH3R CH4 CHX
> CHU M
>
> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
> "dm3" "dm3"
>
> CH2LHet CH2RHet CH3LHet CH3RHet CHXHet CHYHet CHUextra
>
> "dm3" "dm3" "dm3" "dm3" "dm3" "dm3" "dm3"
>
> [[alternative HTML version deleted]]
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
> _________________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/__listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list