[Bioc-devel] VariantAnnotation bug when loading multiple regions from vcf

Valerie Obenchain vobencha at fhcrc.org
Tue Mar 5 19:13:04 CET 2013


Hi Moritz,

Thanks for the bug report and suggested fix. The patch has been applied 
to 1.4.10 in release and 1.5.42 in devel.

Valerie


On 03/05/2013 07:05 AM, Moritz Gerstung wrote:
> Hi,
>
> I discovered the following problem in the geno() elements of the VCF object when I load multiple regions from a .vcf file:
>
> ## Correct
>> tmp = readVcf(system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation"), "hg19")[1:10]
>> geno(tmp)$GT
>             HG00096 HG00097 HG00099 HG00100 HG00101
> rs7410291   "0|0"   "0|0"   "1|0"   "0|0"   "0|0"
> rs147922003 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs114143073 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs141778433 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs182170314 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs115145310 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs186769856 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs77627744  "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs193230365 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs9627788   "0|0"   "0|0"   "1|0"   "0|0"   "0|0"
>
> ## Wrong: Load regions separately using ScanVcfParam(which= )
>> tmp2 = readVcf(system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation"), "hg19", param=ScanVcfParam(which=rowData(tmp)))
>> geno(tmp2)$GT
>             HG00096 HG00097 HG00099 HG00100 HG00101
> rs7410291   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs147922003 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs114143073 "1|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs141778433 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs182170314 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs115145310 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs186769856 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs77627744  "0|0"   "0|0"   "0|0"   "0|0"   "1|0"
> rs193230365 "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
> rs9627788   "0|0"   "0|0"   "0|0"   "0|0"   "0|0"
>
> Note how the elements have been added by column instead of by row.
>
> The following solved the issue for me:
>
> $svn diff VariantAnnotation/
> Index: VariantAnnotation/R/AllUtilities.R
> ===================================================================
> --- VariantAnnotation/R/AllUtilities.R	(revision 73920)
> +++ VariantAnnotation/R/AllUtilities.R	(working copy)
> @@ -62,8 +62,8 @@
>                      cmb
>                  } else {
>                      trans <- lapply(elt, t)
> -                    cmb <- matrix(t(do.call(c, trans)),
> -                                  length(lst[["rowData"]]), d[2])
> +                    cmb <- matrix(do.call(c, trans),
> +                                  length(lst[["rowData"]]), d[2], byrow=TRUE)
>                      cmb
>                  }
>              })
>
>
> My suspicion is that the transposition t() on line 65 didn't do the desired job, as do.call() returned a vector..
>
> I don't fully understand the code, so it may be worth double checking.
> For example, I didn't check the case when the individual geno entries contain more than a single value for each variant and sample.
>
> The same problem also occurred for VariantAnnotation_1.4.9.
>
> Best wishes,
>
> Moritz
>
>
>
> P.S.
>
>> sessionInfo()
> R Under development (unstable) (2013-02-05 r61843)
> Platform: x86_64-apple-darwin11.4.2 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.5.41 Rsamtools_1.11.20        Biostrings_2.27.11
> [4] biomaRt_2.15.0           GenomicRanges_1.11.34    IRanges_1.17.35
> [7] BiocGenerics_0.5.6       BiocInstaller_1.9.6      RColorBrewer_1.0-5
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.21.12   Biobase_2.19.2          bitops_1.0-5
> [4] BSgenome_1.27.1         DBI_0.2-5               GenomicFeatures_1.11.13
> [7] RCurl_1.95-3            RSQLite_0.11.2          rtracklayer_1.19.9
> [10] stats4_3.0.0            tools_3.0.0             XML_3.95-0.1
> [13] zlibbioc_1.5.0
>
>
>
> ---
> Moritz Gerstung
> Postdoctoral Fellow
>
> Cancer Genome Project
> Wellcome Trust Sanger Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridgeshire
> CB10 1SA
> UK
>
>
>



More information about the Bioc-devel mailing list