[BioC] VariantAnnotation ALT Field
Paul Shannon
pshannon at fhcrc.org
Wed Nov 21 18:19:37 CET 2012
Hi Sam,
Here's a quick workaround:
fixed(vcf)[ , c("REF", "ALT")]
The backstory on this is that the ALT field is a DNAStringSetList which, until very recently (the change is in bioc-devel) displayed itself, via its show methods, as '######'. Realizing this was somewhat less than helpful, the latest version of VariantAnnotation display the alt sequence in a more natural way.
But in the meantime, and if you do not use bioc devel, the explicit extraction of REF and ALT demonstrated above should get you part of what you want.
- Paul
On Nov 21, 2012, at 6:50 AM, Samuel Younkin wrote:
> I have been looking at the VariantAnnotation vignette and have encountered something strange. The R code is below. See how the ALT field lists only ########. The vignette, however, correctly shows the alternate allele. The data file chr22.vcf.gz also correctly contains the alternate allele information.
>
> Any suggestions?
>
> Thanks.
>
> Sam
>
> ~~
>
> > library(VariantAnnotation)
> > fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
> > vcf <- readVcf(fl, "hg19")
> > head( fixed(vcf), 3 )
> GRanges with 3 ranges and 5 metadata columns:
> seqnames ranges strand | paramRangeID
> <Rle> <IRanges> <Rle> | <factor>
> rs7410291 22 [50300078, 50300078] * | <NA>
> rs147922003 22 [50300086, 50300086] * | <NA>
> rs114143073 22 [50300101, 50300101] * | <NA>
> REF ALT QUAL FILTER
> <DNAStringSet> <DNAStringSetList> <numeric> <character>
> rs7410291 A ######## 100 PASS
> rs147922003 C ######## 100 PASS
> rs114143073 G ######## 100 PASS
> ---
> seqlengths:
> 22
> NA
> > sessionInfo()
> R version 2.15.2 Patched (2012-10-28 r61038)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
> [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
> [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2
> [4] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.20.3 Biobase_2.18.0 biomaRt_2.14.0
> [4] bitops_1.0-5 BSgenome_1.26.1 DBI_0.2-5
> [7] GenomicFeatures_1.10.1 parallel_2.15.2 RCurl_1.95-3
> [10] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2
> [13] tools_2.15.2 XML_3.95-0.1 zlibbioc_1.4.0
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list