[Bioc-devel] asVCF error coming from normalizeSingleBracketSubscript

Mon Nov 25 23:08:42 CET 2013

Hi Stephanie,

The error is thrown from SeqArray:::.info at line 216 in the file and is 
related to the handing of NA values.

   x[x == ""] <- NA

Output from the == comparison can contain NAs and therefore can't be 
used (consistently) in subsetting operations.

'x' is a NumericList.

Browse[2]> x
NumericList of length 5
[[1]] 0.5
[[2]] 0.017000000923872
[[3]] 0.333000004291534 0.666999995708466
[[4]] <NA>
[[5]] <NA> <NA>

Here we see NAs returned for the NA values,

Browse[2]> x == ""
LogicalList of length 5
[[1]] FALSE
[[2]] FALSE
[[3]] FALSE FALSE
[[4]] <NA>
[[5]] <NA> <NA>

which fail on subsetting.

Browse[2]> x[x == ""]
Error in normalizeSingleBracketSubscript(i, x) : subscript contains NAs

One solution is use %in% which does not return NAs.

Browse[2]> x %in% ""
LogicalList of length 5
[[1]] FALSE
[[2]] FALSE
[[3]] FALSE FALSE
[[4]] FALSE
[[5]] FALSE FALSE

Valerie

On 11/22/2013 03:11 PM, Stephanie M. Gogarten wrote:
> Hi Valerie,
>
> The asVCF method in SeqArray is failing as of today with a (to me)
> mysterious error.  I get it for the test files chr22.vcf.gz, ex2.vcf,
> and gl_chr1.vcf in extdata of VariantAnnotation, but not for
> SeqArray/extdata/CEU_Exon.vcf.  Do you have any suggestions of where I
> might look to figure out where this error is coming from?
>
> thanks,
> Stephanie
>
>  > vcffile <- system.file("extdata", "ex2.vcf",
> package="VariantAnnotation")
>  > gdsfile <- tempfile()
>  > seqVCF2GDS(vcffile, gdsfile)
>  > gdsobj <- seqOpen(gdsfile)
>  > options(error=recover)
>  > vcfg <- asVCF(gdsobj)
> Error in normalizeSingleBracketSubscript(i, x) : subscript contains NAs
>
> Enter a frame number, or 0 to exit
>
>   1: asVCF(gdsobj)
>   2: asVCF(gdsobj)
>   3: .local(x, ...)
>   4: VCF(rowData = .rowData(x), colData = .colData(x), exptData =
> SimpleList(hea
>   5: .info(x, info)
>   6: `[<-`(`*tmp*`, x == "", value = NA)
>   7: `[<-`(`*tmp*`, x == "", value = NA)
>   8: lsubset_List_by_List(x, i, value)
>   9: .fast_lsubset_List_by_List(x, i, value)
> 10: replaceROWS(unlisted_x, unlisted_i, unlisted_value)
> 11: replaceROWS(unlisted_x, unlisted_i, unlisted_value)
> 12: extractROWS(setNames(seq_along(x), names(x)), i)
> 13: extractROWS(setNames(seq_along(x), names(x)), i)
> 14: normalizeSingleBracketSubscript(i, x)
>
>  > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.8.6 Rsamtools_1.14.1        Biostrings_2.30.1
> [4] GenomicRanges_1.14.3    XVector_0.2.0           IRanges_1.20.6
> [7] BiocGenerics_0.8.0      SeqArray_1.2.0          gdsfmt_1.0.0
>
> loaded via a namespace (and not attached):
>   [1] AnnotationDbi_1.24.0   Biobase_2.22.0         biomaRt_2.18.0
>   [4] bitops_1.0-6           BSgenome_1.30.0        DBI_0.2-7
>   [7] GenomicFeatures_1.14.2 RCurl_1.95-4.1         RSQLite_0.11.4
> [10] rtracklayer_1.22.0     stats4_3.0.2           tools_3.0.2
> [13] XML_3.95-0.2           zlibbioc_1.8.0