[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
Mathieu Basille
basille.web at ase-research.org
Tue Jul 30 18:01:21 CEST 2013
Dear list,
Here is a simple example in which the behaviour of 'format' does not make
sense to me. I have read the documentation and searched the archives, but
nothing pointed me in the right direction to understand this behaviour.
Let's start with a simple data frame:
df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
Let's now create a new variable 'id2' which is the character representation
of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers
such as 100,000 are not formatted using their scientific representation (in
this case 1e+05):
df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
Let's have a look at part of the result:
df1$id2[99990:100010]
[1] "99990" "99991" "99992" "99993" "99994" "99995" "99996"
[8] "99997" "99998" "99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
So far, so good. Let's now play with the 'digits' option:
options(digits = 4)
df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df2$id2[99990:100010]
[1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996"
[8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
Notice the extra leading space from 99995 to 99999? To make sure it only
happened there:
df2$id2[which(df1$id2 != df2$id2)]
[1] " 99995" " 99996" " 99997" " 99998" " 99999"
And just to make sure it only occurs in a 'apply' call, here is the same
directly on a numeric vector:
id2 <- format(1:110000, scientific = FALSE)
id2[99990:100010]
[1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
[8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
Here the leading spaces are for every number, which makes sense to me. Is
there anything I'm misinterpreting in the behaviour of 'format'?
Thanks in advance for any hint,
Mathieu.
PS: Some background for this question. It all comes from a Rmd document,
that knitr consistently failed to process, while the R code was fine using
batch or interactive R. knitr uses 'options(digits = 4)' as opposed to
'options(digits = 7)' by default in R, which made one of my function throw
an error with knitr, but not with batch or interactive R. I managed to
solve the problem using 'trim = TRUE' in 'format', but I still do not
understand what's going on...
If you're interested, see here for more details on the original problem:
http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176
--
~$ whoami
Mathieu Basille, PhD
~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314
http://ase-research.org/basille
~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
-- Paul Éluard
More information about the R-help
mailing list