[Bioc-devel] VariantAnnotation: verbose output of readVcf

Valerie Obenchain vobencha at fredhutch.org
Thu Jul 16 17:11:05 CEST 2015


Hi Julian,

Yes, the behavior is intentional though I hadn't thought of the annoying 
chatter in the case of chunking. Sorry about that.

The current readVcf() reads/parses fields according to header 
multiplicity and type; fields without headers are skipped. It is on the 
TODO to be more liberal in reading (especially FORMAT) fields.

Quite a number of people are (a) unaware of header lines and (b) have 
vcf files with incomplete headers. It's confusing to those with 
incomplete headers why all fields aren't read in. So, printing the 
"found" fields was an attempt to communicate which would be read / 
parsed by readVcf().

I've made the following changes in 1.15.21:
- added a 'Header lines' section to the man page to explain this further
- added a 'verbose' arg to readVcf(); when TRUE fields found in header 
are printed

Valerie




On 07/16/2015 02:35 AM, Julian Gehring wrote:
> Hi,
>
> In recent versions of 'VariantAnnotation', the 'readVcf' function prints
> information about the header lines:
>
>    fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
>    vcf <- readVcf(fl, "hg19")
>
> shows
>
>    found header lines for 3 ‘fixed’ fields: ALT, QUAL, FILTER
>    found header lines for 6 ‘info’ fields: NS, DP, AF, AA, DB, H2
>    found header lines for 4 ‘geno’ fields: GT, GQ, DP, HQ
>
> When one reads a VCF in chunks, this gets displayed once per chunk:
>
>    fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
>    param <- ScanVcfParam(fixed="ALT", geno=c("GT", "GL"), info=c("LDAF"))
>    tab <- TabixFile(fl, yieldSize=4000)
>    open(tab)
>    while (nrow(vcf_yield <- readVcf(tab, "hg19", param=param)))
>      cat("vcf dim:", dim(vcf), "\n")
>
>    found header lines for 1 ‘fixed’ fields: ALT
>    found header lines for 1 ‘info’ fields: LDAF
>    found header lines for 2 ‘geno’ fields: GT, GL
>    vcf dim: 5 3
>    found header lines for 1 ‘fixed’ fields: ALT
>    found header lines for 1 ‘info’ fields: LDAF
>    found header lines for 2 ‘geno’ fields: GT, GL
>    vcf dim: 5 3
>    found header lines for 1 ‘fixed’ fields: ALT
>    found header lines for 1 ‘info’ fields: LDAF
>    found header lines for 2 ‘geno’ fields: GT, GL
>    vcf dim: 5 3
>    found header lines for 1 ‘fixed’ fields: ALT
>    found header lines for 1 ‘info’ fields: LDAF
>    found header lines for 2 ‘geno’ fields: GT, GL
>
> For larger files, this get a bit cumbersome. It looks to me like debug
> information. Is this behavior intentional?
>
> Best wishes
> Julian
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fredhutch.org
Phone: (206) 667-3158



More information about the Bioc-devel mailing list