[Bioc-devel] ShortRead::countLines integer overflow with large fastq files

Thomas Girke thomas.girke at ucr.edu
Wed Feb 21 04:08:13 CET 2018


Dear Martin,

countLines in ShrotRead returns the line counts as integers which appears
to create problems with large FASTQ files (>536.8 Mio lines) due to R's
integer limit (2^31-1). When the integer limit is reached/exceeded it seems
that countLines returns negative values not reflecting the number of lines
in a file anymore. At least this is what I learned after several users
reported this problem and then running some tests myself on large FASTQ
files with variable line numbers around the integer limit. If my conclusion
is correct and there aren' t any strong reasons against it, would it be
possible to consider returning numeric values instead either by default or
conditionally (e.g. when the count is >= .Machine$integer.max) to lift this
limit. If this is not possible then returning NAs instead of negative
values would be a sensible compromise.

Thanks,

Thomas

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  utils     datasets  grDevices
methods   base

other attached packages:
 [1] ShortRead_1.36.0           GenomicAlignments_1.14.1
 SummarizedExperiment_1.8.0 DelayedArray_0.4.1         matrixStats_0.52.2
       Biobase_2.38.0             Rsamtools_1.30.0
 GenomicRanges_1.30.0       GenomeInfoDb_1.14.0        Biostrings_2.46.0
      XVector_0.18.0             IRanges_2.12.0
 S4Vectors_0.16.0
[14] BiocParallel_1.12.0        BiocGenerics_0.24.0        setwidth_1.0-4
           colorout_1.1-3

loaded via a namespace (and not attached):
 [1] zlibbioc_1.24.0         lattice_0.20-35         hwriter_1.3.2
 tools_3.4.2             grid_3.4.2              latticeExtra_0.6-28
 Matrix_1.2-12           GenomeInfoDbData_0.99.1 RColorBrewer_1.1-2
bitops_1.0-6            RCurl_1.95-4.8          compiler_3.4.2

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list