[BioC] Help needed! What wrong with VariantAnnotation and TCGA vcfs
Martin Morgan
mtmorgan at fhcrc.org
Fri Apr 12 07:11:13 CEST 2013
On 04/11/2013 07:55 PM, ying chen wrote:
> Hi guys, sorry to bother you again.
>
> I am new to VariantAnnotation package and keep having some weird errors when testing with TCGA vcfs.
>
>> start.loc <- 55086725
>> end.loc <- 55275031
>> test.gr <- GRanges("7", IRanges(start.loc, end.loc))
>> file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17")
>> params <- ScanVcfParam(which=test.gr)
>> vcf <- readVcf(file, "hg19", params)
>
> ## the above run successful with the vcf coming with the VariantAnnotation package
> ## the following tests the same code with TCGA vcf
>
>> dir()
> [1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz" "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi"
>> vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz", "hg19", params)
> Error in `rownames<-`(`*tmp*`, value = "PRIMARY") :
> invalid rownames length
>> hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz")
> Error in `rownames<-`(`*tmp*`, value = "PRIMARY") :
> invalid rownames length
>>
>
> I looked at the TCGA vcf and the chr7-sub.vcf included with VariantAnnotation package and could not tell what want wrong, except that TCGA vcf text file has several "PRIMARY" entries.
>
> Any suggestion?
Sorry, this was a bug in _Rsamtools_; it is fixed in version 1.12.1 which will
be available Saturday after 10am Seattle time. The problems are with the
##SAMPLE lines and the ##vcfProcessLog line.
Martin
>
> Thanks a lot for the help!
>
> Ying
>
> The following is the header and first 2 lines of the TCGA vcf
>
> ##fileformat=VCFv4.0
> ##fileDate=20110203
> ##center=UCSC
> ##source="bambam pipeline v1.1"
> ##reference=<ID=ncbi-human-build36,source="ftp://genome.wustl.edu/pub/reference//NCBI-human-build36/all_sequences.bam">
> ##phasing=none
> ##INDIVIDUAL=TCGA-AF-3913
> ##SAMPLE=<ID=NORMAL,Individual="TCGA-AF-3913",Description="Normal Sample",File="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073-09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING",Platform="Illumina",Source="dbGaP",Accession=SRS131301>
> ##SAMPLE=<ID=PRIMARY,Individual="TCGA-AF-3913",Description="Primary Tumor",File="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073-09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING",Platform="Illumina",Source="dbGaP",Accession=SRS131293>
> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 130">
> ##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation in primary">
> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth for all samples">
> ##INFO=<ID=DEL,Number=1,Type=Integer,Description="Deletion X bps away">
> ##INFO=<ID=INS,Number=1,Type=Integer,Description="Insertion X bps away">
> ##INFO=<ID=VT,Number=1,Type=String,Description="Somatic variant type">
> ##INFO=<ID=ProtCh,Number=1,Type=String,Description="Protein change due to somatic variant">
> ##INFO=<ID=SS,Number=1,Type=Integer,Description="Somatic status of sample">
> ##FILTER=<ID=q10,Description="Genotype Quality < 10">
> ##FILTER=<ID=blq,Description="Position overlaps 1000 Genomes Project mapping quality blacklist">
> ##FILTER=<ID=bldp,Description="Position overlap 1000 Genomes Project depth blacklist">
> ##FILTER=<ID=ma,Description="Position in germline has 2+ support for 2+ alleles">
> ##FILTER=<ID=idl10,Description="Position is within 10 bases of an indel">
> ##FILTER=<ID=idls5,Description="Less than 5 reads supporting indel in appropriate tissue">
> ##FILTER=<ID=fa20,Description="Fraction of ALT below 20% of reads">
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth">
> ##FORMAT=<ID=BQ,Number=1,Type=Integer,Description="Average base quality">
> ##FORMAT=<ID=FA,Number=1,Type=Float,Description="Fraction of reads supporting ALT">
> ##tcgaversion=1.0
> ##vcfProcessLog=<InputVCF=</inside/grotto/bambam/coad_read/exome/TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf>,InputVCFSource=<bambam>,InputVCFVer=<1.1>,InputVCFParam=<exome>>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY
> 1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA 0/1:3:36:0.333 0/1:4:36:0.5
> 1 4793
>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
> attached base packages:
> [1] splines stats4 parallel stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] cgdv17_0.0.20 TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0
> [3] GenomicFeatures_1.12.0 GGtools_4.8.0
> [5] GGBase_3.22.0 snpStats_1.10.0
> [7] Matrix_1.0-12 lattice_0.20-15
> [9] survival_2.37-4 org.Hs.eg.db_2.9.0
> [11] RSQLite_0.11.2 DBI_0.2-5
> [13] AnnotationDbi_1.22.1 Biobase_2.20.0
> [15] VariantAnnotation_1.6.1 Rsamtools_1.12.0
> [17] Biostrings_2.28.0 GenomicRanges_1.12.1
> [19] IRanges_1.18.0 BiocGenerics_0.6.0
> [21] BiocInstaller_1.10.0
> loaded via a namespace (and not attached):
> [1] annotate_1.38.0 biomaRt_2.16.0 bit_1.1-10 bitops_1.0-5 BSgenome_1.28.0
> [6] ff_2.2-11 genefilter_1.42.0 grid_3.0.0 RCurl_1.95-4.1 rtracklayer_1.20.0
> [11] tools_3.0.0 XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0
>>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list