[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
David Winsemius
dwinsemius at comcast.net
Tue Jul 30 19:58:37 CEST 2013
On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
> Dear list,
>
> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>
> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>
> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>
> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>
> Let's have a look at part of the result:
>
> df1$id2[99990:100010]
> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996"
> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003"
> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
> df1$id2[99990:100010]
[1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" "99997"
[9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" "100005"
[17] "100006" "100007" "100008" "100009" "100010"
(I did notice that generation of the id2 variable seemed to take an inordinately long time.)
--
David.
>
> So far, so good. Let's now play with the 'digits' option:
>
> options(digits = 4)
> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> df2$id2[99990:100010]
> [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996"
> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>
> Notice the extra leading space from 99995 to 99999? To make sure it only happened there:
>
> df2$id2[which(df1$id2 != df2$id2)]
> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
>
> And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector:
>
> id2 <- format(1:110000, scientific = FALSE)
> id2[99990:100010]
> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>
> Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'?
> Thanks in advance for any hint,
> Mathieu.
>
>
> PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on...
> If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176
>
>
> --
>
> ~$ whoami
> Mathieu Basille, PhD
>
> ~$ locate --details
> University of Florida \\
> Fort Lauderdale Research and Education Center
> (+1) 954-577-6314
> http://ase-research.org/basille
>
> ~$ fortune
> « Le tout est de tout dire, et je manque de mots
> Et je manque de temps, et je manque d'audace. »
> -- Paul Éluard
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list