[BioC] mutant allele read counts
Valerie Obenchain
vobencha at fhcrc.org
Fri Jun 13 21:29:09 CEST 2014
Hi,
Use readVcfAsVRanges() then coerce to a data.frame.
fl <- system.file("extdata", "chr7-sub.vcf.gz", package="VariantAnnotation")
vr <- readVcfAsVRanges(fl, "hg19")
df <- as.data.frame(vr)
You'll have some extra columns in the data.frame but you can remove /
rename columns as necessary.
Valerie
On 06/13/2014 10:46 AM, Murli [guest] wrote:
> Hi,
> I am interested in extracting information for functional annotation using CRAVAT. It requires the data to be in the following format.
> ===========================================
> # UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional)
> TR1 chr17 7577506 - G T TCGA-02-0231
> TR2 chr10 123279680 - G A TCGA-02-3512
> TR3 chr13 49033967 + C A TCGA-02-3532
> TR4 chr7 116417505 + G T TCGA-02-1523
> TR5 chr7 140453136 - T A TCGA-02-0023
> TR6 chr17 37880998 + G T TCGA-02-0252
> Ins1 chr17 37880998 + G GT TCGA-02-0252
> Del1 chr17 37880998 + GA G TCGA-02-0252
> CSub1 chr2 39871235 + ATGCT GA TCGA-02-0252
>
> ===============================================
> http://www.cravat.us/help.jsp?chapter=how_to_cite&article=#
>
> I am trying to extract this information from vcf files generated by mutect. I am using VariantAnnotation extract this information. I have read the file using readVcf(), and renamed the chromosomes according to txdb.
>
> rowData(newVcfData)
> GRanges with 62991 ranges and 5 metadata columns:
> seqnames ranges strand | paramRangeID
> <Rle> <IRanges> <Rle> | <factor>
> 1:109641_A/G chr1 [109641, 109641] * | <NA>
> 1:526561_T/G chr1 [526561, 526561] * | <NA>
> 1:691958_G/A chr1 [691958, 691958] * | <NA>
> 1:763781_A/T chr1 [763781, 763781] * | <NA>
> rs6594026 chr1 [782981, 782981] * | <NA>
> ... ... ... ... ... ...
> rs480725 chrX [154903224, 154903224] * | <NA>
> X:154925893_C/T chrX [154925893, 154925893] * | <NA>
> X:155038107_C/G chrX [155038107, 155038107] * | <NA>
> X:155204257_G/T chrX [155204257, 155204257] * | <NA>
> X:155234730_T/C chrX [155234730, 155234730] * | <NA>
> REF ALT QUAL FILTER
> <DNAStringSet> <DNAStringSetList> <numeric> <character>
> 1:109641_A/G A G 8.90 PASS
> 1:526561_T/G T G 9.19 PASS
> 1:691958_G/A G A 13.74 PASS
> 1:763781_A/T A T 16.03 PASS
> rs6594026 C T 11.24 PASS
> ... ... ... ... ...
> rs480725 A T 6.39 PASS
> X:154925893_C/T C T 6.53 PASS
> X:155038107_C/G C G 6.64 PASS
> X:155204257_G/T G T 6.35 PASS
> X:155234730_T/C T C 6.51 PASS
> ---
> seqlengths:
> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX
> NA NA NA NA NA NA ... NA NA NA NA NA NA
>
>
> Can the information be extracted using VariantAnnotation()? I would appreciate your help with this.
> Thanks ../Murli
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
> [2] GenomicFeatures_1.14.5
> [3] AnnotationDbi_1.24.0
> [4] Biobase_2.22.0
> [5] VariantAnnotation_1.8.13
> [6] Rsamtools_1.14.3
> [7] Biostrings_2.30.1
> [8] GenomicRanges_1.14.4
> [9] XVector_0.2.0
> [10] IRanges_1.20.7
> [11] BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.7 stats4_3.0.2
> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109
Email: vobencha at fhcrc.org
Phone: (206) 667-3158
More information about the Bioconductor
mailing list