[Rd] R crashes when using huge data sets with character string variables

Arne Henningsen @rne@henn|ng@en @end|ng |rom gm@||@com
Sun Dec 13 00:19:40 CET 2020


When working with a huge data set with character string variables, I
experienced that various commands let R crash. When I run R in a
Linux/bash console, R terminates with the message "Killed". When I use
RStudio, I get the message "R Session Aborted. R encountered a fatal
error. The session was terminated. Start New Session". If an object in
the R workspace needs too much memory, I would expect that R would not
crash but issue an error message "Error: cannot allocate vector of
size ...".  A minimal reproducible example (at least on my computer)
is:

nObs <- 1e9

date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,
1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )

Is this a bug or a feature of R?

Some information about my R version, OS, etc:

R> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3

/Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name



More information about the R-devel mailing list