[Rd] best way to write tests when sort() evaluates differently in R CMD check due to LC_COLLATE locale setting?
Skye Bender-deMoll
skyebend at skyeome.net
Mon Apr 14 23:36:51 CEST 2014
Dear R devel,
What is the correct way to write package tests that could possibly fail
due to locale collation behavior? Is it safe/proper for me to call
Sys.setlocale("LC_COLLATE", "en_US.UTF-8") in each test file? Or should
I explicitly force collation to C before writing tests? Or do I need to
always call sort() on my comparison objects to ensure they are sorted in
the same locale-specific way?
I'd had a strange situation where a package test I'm writing fails R CMD
check, but runs fine in the R terminal. I eventually got to the point
where I can see that in R CMD check, the vector I'm comparing to
evaluate the test result did not seem to be sorted as requested. Further
digging revealed that the locale's LC_COLLATE value is set to 'C' in R
CMD check while it is "en_US.UTF-8" in my R terminal.
Now that I know what to look for in the documentation, I realize that
this is a feature. p.36 of "Writing R Extensions" states:
"All these tests are run with collation set to the C
locale, and for the examples and tests with environment variable
LANGUAGE=en: this is to minimize differences between platforms. "
It appears that this impacts the sort order of capital letters
> Sys.setlocale("LC_COLLATE", "C")
[1] "C"
> sort(c("a",'A','b','c'))
[1] "A" "a" "b" "c"
> Sys.setlocale("LC_COLLATE", "en_US.UTF-8")
[1] "en_US.UTF-8"
> sort(c("a",'A','b','c'))
[1] "a" "A" "b" "c"
best,
-skye
More information about the R-devel
mailing list