[BioC] Printing Alt alleles using VariantAnnotation
James W. MacDonald
jmacdon at uw.edu
Fri Sep 28 16:44:31 CEST 2012
Hi Mark,
On 9/28/2012 6:06 AM, Mark Dunning wrote:
> Hi all,
>
> I am doing some processing of vcf files using the VariantAnnotation
> package, and eventually I want to write out a table that I can use the
> annovar annotation package tool on
> (http://www.openbioinformatics.org/annovar/). The table needs to be in
> the form
>
> CHR, Start, end, Ref, Alt
>
> e.g.
>
> 1 55 55 T G
> 1 2646 2646 G A
>
> I'm fine extracting the chromosome, start and end. To get the
> referrence alleles I do.
>
>> Ref<- as.data.frame(values(ref(vcf))[["REF"]])[,1]
> But the Alt allele is a bit more complicated. If I do something like;
>
>> alternate = as.data.frame(unlist(values(fixed(vcf))[["ALT"]]))[,1]
How about
alternate <- sapply(values(fixed(vcf))[["ALT"]], paste, collapse = ",")
Best,
Jim
> The number of rows could be greater than the number of variants in the
> vcf file, especially for indels where more than one alternate allele
> could be found. I can no longer easily construct the data frame.
>
> Is there an easy way to write all alternate alleles for the same
> position in a comma-separated string so that entries in the table
> could be in the form
>
> 1 55 55 T G,C
> (e,g, G and C alternate alleles were found for the SNP at position
> chromosome 1: 55-55)
>
>
> Regards,
>
> Mark
>
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] VariantAnnotation_1.2.11 Rsamtools_1.8.6 Biostrings_2.24.1
> [4] ggplot2_0.9.2.1 GenomicRanges_1.8.13 IRanges_1.14.4
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.18.3 Biobase_2.16.0 biomaRt_2.12.0
> [4] bitops_1.0-4.1 BSgenome_1.24.0 colorspace_1.1-1
> [7] DBI_0.2-5 dichromat_1.2-4 digest_0.5.2
> [10] GenomicFeatures_1.8.3 grid_2.15.1 gtable_0.1.1
> [13] labeling_0.1 lattice_0.20-10 MASS_7.3-21
> [16] Matrix_1.0-9 memoise_0.1 munsell_0.4
> [19] plyr_1.7.1 proto_0.3-9.2 RColorBrewer_1.0-5
> [22] RCurl_1.91-1 reshape2_1.2.1 RSQLite_0.11.2
> [25] rtracklayer_1.16.3 scales_0.2.2 snpStats_1.6.0
> [28] splines_2.15.1 stats4_2.15.1 stringr_0.6.1
> [31] survival_2.36-14 tools_2.15.1 XML_3.9-4
> [34] zlibbioc_1.2.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list