[Bioc-devel] on to AnnotationTrack with rtracklayer [was Re: IGV VCF demo, other suggestions? [was Re: IGV - a new package in preparation]]
Paul Shannon
paul.thurmond.shannon at gmail.com
Wed Mar 21 00:09:02 CET 2018
I have now implemented VCF tracks for IGV, supporting both
a local VCF object read and filtered by the VariantAnnotation package, and
a remote webserver-hosted vcf file.
In normal use I expect (and recommend) that the local VCF object will be relatively small (< 1Mb, < 50 samples - or some tradeoff of those approximate numbers), and that the genome scale vcf file is accompanied by an index.
I am now turning to annotation tracks: bed, bed9, gff, gff3, gtf. rtracklayer provides a good set of importers for these formats, and S4 classes to represent them (apparently all are subclasses of GenomicRanges):
BEDFile (3 required fields, up to 9 optional fields - https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
GFFFile (includes gff, gff3, gtf)
I propose to support four different representations of these data in R:
data.frame
the two rtracklayer classes
a url pointing to a web-hosted and indexed annotation
The AnnotationTrack constructor accepts all three in the “annotation” parameter, a simple version of which (with many parameters defaulted) is:
track <- AnnotationTrack(trackName, annotation, color, displayMode)
The annotation parameter will be inspected by the constructor: is it a data.frame? a BEDFile? a GFFFile? a url?
The local data is reformatted as needed into a file with a format igv.js understands - native bed and gff text files - then passed to igv as a local url. Remote urls are transmitted without change.
Does this sound right? If you have a minute to comment, now is a good time to offer critique and suggestions on annotation tracks.
Next up after the AnnotationTrack class will be alignment (bam) tracks and, if I get to it before package submission data, a “seg” track for segmented copy number data.
Last week Gabe asked:
> If myigv represents the IGV session/state, then add_track(myigv, vcfobj) could call down to add_track(myigv,VariantTrack(vcf)) so you'd get the default behaviors. you could also support add_track(myigv, vcf, title = "bla", homVarColor = "whateverman") which would call down to add_track(myigv, VariantTrack(vcf, title = "bla", homVarColor = "whateverman”))
>
> This is easy to do (I'm assume the IGVSession class name but replace it with whatever class add_track is endomorphic in...):
> setMethod("add_track", signature = c("IGVSession", "VCF"), function(igv, track, ...) add_track(igv, VariantTrack(track, ...)))
> setMethod("add_track", signature = c("IGVSession", "BAM", function(igv, track, ...) add_track(igv, AlignmentTrack(track, ...)))
>
> This would, as Michael points out, give you the default values of the parameter when you just call add_track(myigv, vcfobj)
I hope I don’t sound disrespectful by describing these shorter methods as only syntactic simplifications with a little S4 dispatch thrown in. They have value, for sure, but are they not just a relatively thin layer on top of the classes I am writing now? *If* that description is accurate, then I’d rather consider adding them later, after the nuts and bolts and basic operations are all written, tested, and subjected to a few months of user QC. I admit that I also prefer the greater operational clarity which for me, with my plodding brain, comes from using by explicit data types and explicit constructors.)
- Paul
> On Mar 14, 2018, at 1:05 PM, Michael Lawrence <lawrence.michael at gene.com> wrote:
>
> Agreed about encapsulating plot parameters. I was thinking in terms of user convenience, relying on defaults.
>
> On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <paul.thurmond.shannon at gmail.com> wrote:
> Hi Michael,
>
> Set me straight if I got this wrong. You suggest:
>
> > There should be no need to explicitly construct a track; just rely on dispatch and class semantics, i.e., passing a VCF object to add_track() would create a variant track automatically.
>
> But wouldn’t
>
> displayTrack(vcf)
>
> preclude any easy specification of options - which vary across track types - which are straightforward, easily managed and checked, by a set of track constructors?
>
> Two examples:
>
> displayTrack(VariantTrack(vcf, title=“mef2c eqtl”, height=“300”, homrefColor=“lightGray”,
> homVarColor=“darkRed”, hetVarColor=“lightRed”))
>
> displayTrack(AlignmentTrack(x, title=“bam 32”, viewAsPairs=TRUE, insertionColor=“black”))
>
>
> So I suggest that the visualization of tracks has lots of track-type-specific settings which the user will want to control, and which would be messy to handle with an open-ended set of optional “…” args to a dispatch-capable single “displayTrack” method.
>
> - Paul
>
More information about the Bioc-devel
mailing list