[BioC] mutant allele read counts
Murli [guest]
guest at bioconductor.org
Fri Jun 13 19:46:23 CEST 2014
Hi,
I am interested in extracting information for functional annotation using CRAVAT. It requires the data to be in the following format.
===========================================
# UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional)
TR1 chr17 7577506 - G T TCGA-02-0231
TR2 chr10 123279680 - G A TCGA-02-3512
TR3 chr13 49033967 + C A TCGA-02-3532
TR4 chr7 116417505 + G T TCGA-02-1523
TR5 chr7 140453136 - T A TCGA-02-0023
TR6 chr17 37880998 + G T TCGA-02-0252
Ins1 chr17 37880998 + G GT TCGA-02-0252
Del1 chr17 37880998 + GA G TCGA-02-0252
CSub1 chr2 39871235 + ATGCT GA TCGA-02-0252
===============================================
http://www.cravat.us/help.jsp?chapter=how_to_cite&article=#
I am trying to extract this information from vcf files generated by mutect. I am using VariantAnnotation extract this information. I have read the file using readVcf(), and renamed the chromosomes according to txdb.
rowData(newVcfData)
GRanges with 62991 ranges and 5 metadata columns:
seqnames ranges strand | paramRangeID
<Rle> <IRanges> <Rle> | <factor>
1:109641_A/G chr1 [109641, 109641] * | <NA>
1:526561_T/G chr1 [526561, 526561] * | <NA>
1:691958_G/A chr1 [691958, 691958] * | <NA>
1:763781_A/T chr1 [763781, 763781] * | <NA>
rs6594026 chr1 [782981, 782981] * | <NA>
... ... ... ... ... ...
rs480725 chrX [154903224, 154903224] * | <NA>
X:154925893_C/T chrX [154925893, 154925893] * | <NA>
X:155038107_C/G chrX [155038107, 155038107] * | <NA>
X:155204257_G/T chrX [155204257, 155204257] * | <NA>
X:155234730_T/C chrX [155234730, 155234730] * | <NA>
REF ALT QUAL FILTER
<DNAStringSet> <DNAStringSetList> <numeric> <character>
1:109641_A/G A G 8.90 PASS
1:526561_T/G T G 9.19 PASS
1:691958_G/A G A 13.74 PASS
1:763781_A/T A T 16.03 PASS
rs6594026 C T 11.24 PASS
... ... ... ... ...
rs480725 A T 6.39 PASS
X:154925893_C/T C T 6.53 PASS
X:155038107_C/G C G 6.64 PASS
X:155204257_G/T G T 6.35 PASS
X:155234730_T/C T C 6.51 PASS
---
seqlengths:
chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX
NA NA NA NA NA NA ... NA NA NA NA NA NA
Can the information be extracted using VariantAnnotation()? I would appreciate your help with this.
Thanks ../Murli
-- output of sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
[2] GenomicFeatures_1.14.5
[3] AnnotationDbi_1.24.0
[4] Biobase_2.22.0
[5] VariantAnnotation_1.8.13
[6] Rsamtools_1.14.3
[7] Biostrings_2.30.1
[8] GenomicRanges_1.14.4
[9] XVector_0.2.0
[10] IRanges_1.20.7
[11] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
[5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.22.7 stats4_3.0.2
[9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list