[BioC] VariantAnnotation: Performance and memory issues in readVcf
Ulrich Bodenhofer
bodenhofer at bioinf.jku.at
Thu May 16 14:34:26 CEST 2013
Thanks for your reply, Valerie!
> [...]
> You mention that one file gives you 2 warnings but another gives you
50. Are the other 50 warnings the same?
I checked the warning messages again and it turned out that I was wrong:
the "duplicate keys" message does not appear multiple times, but,
consistently with the ScanVcfParam example I sent yesterday, it appears
only twice. All other warning messages (at least the ones that I can see
with warnings()) are the following:
unpackVcf field 'AD': NAs introduced by coercion
R just gives the first 50 warnings, so I do not know how often this one
appears, but my estimate is that it appears as many times as the VCF
sub-set has records (8,757 in my example). Do you think that this number
of warnings could lead to the observed performance bottleneck? No matter
whether this is the source of the problem or not: the lesson I learned
is that I should always focus on the minimum necessary information when
reading a VCF file. So thanks to you and Vincent for your great help!
Best regards,
Ulrich
More information about the Bioconductor
mailing list