[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Mathieu Basille basille.web at ase-research.org
Thu Aug 1 17:31:28 CEST 2013


This problem does not seem to be widely popular, but at least affects two 
users (both on Linux, maybe a hint here?). To me, it looks like a bug (is 
it a R bug, or a OS-related bug, I don't know). Should I forward it to 
R-devel, or some other place where R gurus may have a chance to look at it?

Mathieu.


Le 07/30/2013 02:34 PM, arun a écrit :
> Hi Mathieu
> yes, the original problem occurs in my system too. I am using R 3.0.1 on linux mint 15.  I guess the default case would be trim=FALSE, but still it looks very strange especially in ?apply(), as it starts from " 99995" onwards.
>
> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>   [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] stringr_0.6.2  reshape2_1.2.2
>
> loaded via a namespace (and not attached):
> [1] plyr_1.8    tools_3.0.1
>
>
>
>
>
>
>
>
> ----- Original Message -----
> From: Mathieu Basille <basille.web at ase-research.org>
> To: arun <smartpink111 at yahoo.com>
> Cc: R help <r-help at r-project.org>
> Sent: Tuesday, July 30, 2013 2:29 PM
> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
>
> Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms
> of the problem, and this is the solution I'm currently using. However, it
> does not help to understand what the problem is, and what is the cause of it.
>
> Can you confirm that the original problem also occurs on your computer (and
> what is your OS)? It would be interesting since David is not able to
> reproduce the problem with Mac OS X.
> Mathieu.
>
>
> Le 07/30/2013 02:15 PM, arun a écrit :
>> Hi,
>> Try using trim=TRUE, in ?format()
>> options(digits=4)
>>
>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>     df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], trim=TRUE,scientific = FALSE))
>>      df2$id2[99990:100010]
>> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>> # [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
>> #[17] "100006" "100007" "100008" "100009" "100010"
>>
>>
>> id2 <- format(1:110000, scientific = FALSE,trim=TRUE)
>> id2[99990:100010]
>> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>     #[9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
>> #[17] "100006" "100007" "100008" "100009" "100010"
>> A.K.
>>
>>
>> ----- Original Message -----
>> From: Mathieu Basille <basille.web at ase-research.org>
>> To: David Winsemius <dwinsemius at comcast.net>
>> Cc: r-help at r-project.org
>> Sent: Tuesday, July 30, 2013 2:07 PM
>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
>>
>> Thanks David for your interest. I have to admit that your answer puzzles me
>> even more than before. It seems that the underlying problem is way beyond
>> my R skills...
>>
>> The generation of id2 is indeed quite demanding, especially compared to a
>> simple 'as.character' call. Anyway, since it seems to be system specific,
>> here is the sessionInfo() that I forgot to attach to my first message:
>>
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>      [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>>      [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>>      [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>>      [7] LC_PAPER=C                 LC_NAME=C
>>      [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> In brief: last stable R available under Debian Testing... Hopefully this
>> can help tracking down the problem.
>> Mathieu.
>>
>>
>> Le 07/30/2013 01:58 PM, David Winsemius a écrit :
>>>
>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>>>
>>>> Dear list,
>>>>
>>>> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>>>>
>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>>
>>>> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>>>>
>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>>>
>>>> Let's have a look at part of the result:
>>>>
>>>> df1$id2[99990:100010]
>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
>>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>
>>> Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>>>
>>>> df1$id2[99990:100010]
>>>      [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>>      [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
>>> [17] "100006" "100007" "100008" "100009" "100010"
>>>
>>> (I did notice that generation of the id2 variable seemed to take an inordinately long time.)
>>>
>>> -- David.
>>>>
>>>> So far, so good. Let's now play with the 'digits' option:
>>>>
>>>> options(digits = 4)
>>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>>> df2$id2[99990:100010]
>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>>
>>>> Notice the extra leading space from 99995 to 99999? To make sure it only happened there:
>>>>
>>>> df2$id2[which(df1$id2 != df2$id2)]
>>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
>>>>
>>>> And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector:
>>>>
>>>> id2 <- format(1:110000, scientific = FALSE)
>>>> id2[99990:100010]
>>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>>
>>>> Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'?
>>>> Thanks in advance for any hint,
>>>> Mathieu.
>>>>
>>>>
>>>> PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on...
>>>> If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176
>>>>
>>>>
>>>> --
>>>>
>>>> ~$ whoami
>>>> Mathieu Basille, PhD
>>>>
>>>> ~$ locate --details
>>>> University of Florida \\
>>>> Fort Lauderdale Research and Education Center
>>>> (+1) 954-577-6314
>>>> http://ase-research.org/basille
>>>>
>>>> ~$ fortune
>>>> « Le tout est de tout dire, et je manque de mots
>>>> Et je manque de temps, et je manque d'audace. »
>>>> -- Paul Éluard
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>
>>
>>
>>>
>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>>>
>>>> Dear list,
>>>>
>>>> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>>>>
>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>>
>>>> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>>>>
>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>>>
>>>> Let's have a look at part of the result:
>>>>
>>>> df1$id2[99990:100010]
>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
>>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>
>>> Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>>>
>>>> df1$id2[99990:100010]
>>>       [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>>       [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
>>> [17] "100006" "100007" "100008" "100009" "100010"
>>>
>>> (I did notice that generation of the id2 variable seemed to take an inordinately long time.)
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list