[BioC] mutant allele read counts

Murli [guest] guest at bioconductor.org
Fri Jun 13 19:46:23 CEST 2014


Hi, 
I am interested in extracting information for functional annotation using CRAVAT. It requires the data to be in the following format. 
===========================================
# UID / Chr. / Position / Strand / Ref. base / Alt. base / Sample ID (optional)
TR1	chr17	7577506	-	G	T	TCGA-02-0231
TR2	chr10	123279680	-	G	A	TCGA-02-3512
TR3	chr13	49033967	+	C	A	TCGA-02-3532
TR4	chr7	116417505	+	G	T	TCGA-02-1523
TR5	chr7	140453136	-	T	A	TCGA-02-0023
TR6	chr17	37880998	+	G	T	TCGA-02-0252
Ins1 chr17	37880998	+	G	GT	TCGA-02-0252
Del1 chr17	37880998	+	GA	G	TCGA-02-0252
CSub1 chr2	39871235	+	ATGCT	GA	TCGA-02-0252

===============================================
http://www.cravat.us/help.jsp?chapter=how_to_cite&article=#

I am trying to extract this information from vcf files generated by mutect. I am using VariantAnnotation extract this information. I have read the file using readVcf(), and renamed the chromosomes according to txdb.  

rowData(newVcfData)
GRanges with 62991 ranges and 5 metadata columns:
                  seqnames                 ranges strand   | paramRangeID
                     <Rle>              <IRanges>  <Rle>   |     <factor>
     1:109641_A/G     chr1       [109641, 109641]      *   |         <NA>
     1:526561_T/G     chr1       [526561, 526561]      *   |         <NA>
     1:691958_G/A     chr1       [691958, 691958]      *   |         <NA>
     1:763781_A/T     chr1       [763781, 763781]      *   |         <NA>
        rs6594026     chr1       [782981, 782981]      *   |         <NA>
              ...      ...                    ...    ... ...          ...
         rs480725     chrX [154903224, 154903224]      *   |         <NA>
  X:154925893_C/T     chrX [154925893, 154925893]      *   |         <NA>
  X:155038107_C/G     chrX [155038107, 155038107]      *   |         <NA>
  X:155204257_G/T     chrX [155204257, 155204257]      *   |         <NA>
  X:155234730_T/C     chrX [155234730, 155234730]      *   |         <NA>
                             REF                ALT      QUAL      FILTER
                  <DNAStringSet> <DNAStringSetList> <numeric> <character>
     1:109641_A/G              A                  G      8.90        PASS
     1:526561_T/G              T                  G      9.19        PASS
     1:691958_G/A              G                  A     13.74        PASS
     1:763781_A/T              A                  T     16.03        PASS
        rs6594026              C                  T     11.24        PASS
              ...            ...                ...       ...         ...
         rs480725              A                  T      6.39        PASS
  X:154925893_C/T              C                  T      6.53        PASS
  X:155038107_C/G              C                  G      6.64        PASS
  X:155204257_G/T              G                  T      6.35        PASS
  X:155234730_T/C              T                  C      6.51        PASS
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 ...  chr5  chr6  chr7  chr8  chr9  chrX
      NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA


Can the information be extracted using VariantAnnotation()? I would appreciate your help with this. 
Thanks ../Murli



 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
 [2] GenomicFeatures_1.14.5
 [3] AnnotationDbi_1.24.0
 [4] Biobase_2.22.0
 [5] VariantAnnotation_1.8.13
 [6] Rsamtools_1.14.3
 [7] Biostrings_2.30.1
 [8] GenomicRanges_1.14.4
 [9] XVector_0.2.0
[10] IRanges_1.20.7
[11] BiocGenerics_0.8.0

loaded via a namespace (and not attached):
 [1] biomaRt_2.18.0     bitops_1.0-6       BSgenome_1.30.0    DBI_0.2-7
 [5] RCurl_1.95-4.1     RSQLite_0.11.4     rtracklayer_1.22.7 stats4_3.0.2
 [9] tools_3.0.2        XML_3.98-1.1       zlibbioc_1.8.0


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list