[Bioc-devel] VRanges-class positive strandness and locateVariants() strandawareness
Robert Castelo
robert.castelo at upf.edu
Fri May 22 21:56:49 CEST 2015
ok, thanks for the clarification, I was using release and did not try devel.
actually, this also affects the function predictCoding() and there,
using ignore.strand=TRUE is not appropriate, so my workaround by now in
release is to coerce the input VRanges to GRanges and set strand to '*'
before calling locateVariants() or predictCoding(). Just mentioning in
case somebody encountered the same situation in release.
cheers,
robert.
On 05/22/2015 09:49 PM, Michael Lawrence wrote:
> This changed recently. VariantAnnotation in devel no longer enforces a
> strand on VRanges, or at least it allows the "*" case.
>
>
> On Fri, May 22, 2015 at 11:33 AM, Robert Castelo <robert.castelo at upf.edu
> <mailto:robert.castelo at upf.edu>> wrote:
>
> Hi,
>
> I have encountered myself in a strange situation when using the
> function locateVariants() from VariantAnnotation with an input
> VRanges object. The problem is that some of the expected coding
> annotations are not showing up when using locateVariants() with
> default parameters.
>
> After investigating this situation I think I found the reason, which
> does not look like a bug but I would like that you give me some
> clarification about the logic behind using locateVariants() with
> VRanges objects.
>
> The documentation of the VRanges-class says that in this class of
> objects "The strand is always constrained to be positive (+).". I
> guess there may be a good reason for this but I could not find it in
> the documentation or googling about it.
>
> This means that when you coerce a CollapsedVCF object (obtained, for
> example, from a VCF file via readVcf()) to a VRanges object, even
> though variants in the VCF may have no strand, they get a positive
> strand in the VRanges object.
>
> The problem arises then, when you call locateVariants() with this
> VRanges object, because features on the negative strand are never
> going to overlap with the variants since, by default, the argument
> ignore.strand=FALSE.
>
> Let me illustrate this with a toy example. Consider the SNP
> rs1129038
> (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1129038) at
> chr15:28356859 with allels A/G. It is located on the 3' UTR of the
> gene HERC2 coded on the negative strand of the human reference
> genome. Let's build a toy VRanges object having this variant:
>
> library(VariantAnnotation)
> vr <- VRanges(seqnames="chr15",
> ranges=IRanges(28356859, 28356859),
> ref="A", alt="G",
> refDepth=5, altDepth=7,
> totalDepth=12, sampleNames="A")
> strand(vr)
> factor-Rle of length 1 with 1 run
> Lengths: 1
> Values : +
> Levels(3): + - *
>
> Let's build now its CollapsedVCF counterpart by using the
> corresponding coercion method and set the strand to "*":
>
> vcf <- asVCF(vr)
> strand(vcf) <- "*"
>
> Now run locateVariants() on both objects with UCSC annotations:
>
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>
> locateVariants(vcf, txdb, region=AllVariants())
> GRanges object with 2 ranges and 9 metadata columns:
> seqnames ranges strand | LOCATION LOCSTART
> LOCEND QUERYID TXID CDSID
> <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <integer>
> <character> <IntegerList>
> [1] chr15 [28356859, 28356859] * | threeUTR 50 50
> 1 55386
> [2] chr15 [28356859, 28356859] * | threeUTR 50 50
> 1 55387
> GENEID PRECEDEID FOLLOWID
> <character> <CharacterList> <CharacterList>
> [1] 8924
> [2] 8924
> -------
> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> locateVariants(vr, txdb, region=AllVariants())
> GRanges object with 1 range and 9 metadata columns:
> seqnames ranges strand | LOCATION LOCSTART
> LOCEND QUERYID TXID CDSID
> <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <integer>
> <integer> <IntegerList>
> [1] chr15 [28356859, 28356859] + | intergenic <NA> <NA>
> 1 <NA>
> GENEID PRECEDEID FOLLOWID
> <character> <CharacterList> <CharacterList>
> [1] <NA> 100132565,100289656,100616223,... 2567
> -------
> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> Note that while we get the 3' UTR annotation from the strandless VCF
> object we do not get it from the VRanges object with the positive
> strand. To make my point clear: this positive strand shows up when
> you coerce a strandless VCF object to a VRanges one, because
> positive strandness seems to be the convention for VRanges objects:
>
> as(vcf, VRanges)
> VRanges object with 1 range and 1 metadata column:
> seqnames ranges strand ref
> alt totalDepth refDepth altDepth
> <Rle> <IRanges> <Rle> <character> <characterOrRle> <integerOrRle>
> <integerOrRle> <integerOrRle>
> [1] chr15 [28356859, 28356859] + A
> G 12 5 7
> sampleNames softFilterMatrix | QUAL
> <factorOrRle> <matrix> | <numeric>
> [1] A | <NA>
> -------
> seqinfo: 1 sequence from an unspecified genome; no seqlengths
> hardFilters: NULL
>
>
> Of course, if I run locateVariants() with the argument
> ignore.strand=TRUE, then I get the expected annotation:
>
> locateVariants(vr, txdb, region=AllVariants(), ignore.strand=TRUE)
> GRanges object with 2 ranges and 9 metadata columns:
> seqnames ranges strand | LOCATION LOCSTART
> LOCEND QUERYID TXID CDSID
> <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <integer>
> <character> <IntegerList>
> [1] chr15 [28356859, 28356859] + | threeUTR 677
> 677 1 55386
> [2] chr15 [28356859, 28356859] + | threeUTR 677
> 677 1 55387
> GENEID PRECEDEID FOLLOWID
> <character> <CharacterList> <CharacterList>
> [1] 8924
> [2] 8924
> -------
> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>
> So, my question is, given that VRanges objects are enforced to have
> a positive strand, would not be better to have ignore.strand=TRUE as
> default in locateVariants?
>
> Alternatively, I would recommend that locateVariants() issues a
> warning, or maybe an error, when the input object is VRanges and
> ignore.strand=FALSE.
>
> Finally, out of curiosity, why a VRanges object enforces the
> positive strand in all its genomic ranges? Would not be better just
> taking the strand of the CollapsedVCF object when coercing the
> CollapsedVCF object to VRanges?
>
>
> thanks!!
>
>
> robert.
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
More information about the Bioc-devel
mailing list