[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Mathieu Basille basille.web at ase-research.org
Tue Aug 27 05:21:28 CEST 2013


Thanks to Aleksey Vorona and Duncan Murdoch, this bug is now fixed in R-devel!

Mathieu.


Le 08/01/2013 01:47 PM, William Dunlap a écrit :
> You could report it as a bug at
>    https://bugs.r-project.org/bugzilla3/
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: Mathieu Basille [mailto:basille.web at ase-research.org]
>> Sent: Thursday, August 01, 2013 10:31 AM
>> To: R help
>> Cc: William Dunlap
>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
>>
>> Nicely spotted, Bill! You went much farther than I could have. We can
>> basically summarize the problem with the following simple example:
>>
>>   > format(9994, digits = 3)
>> [1] "9994"
>>   > format(9995, digits = 3)
>> [1] " 9995"
>>
>> I'm still not sure why this is happening, though: The 'digits' parameter is
>> used to guess the number of characters of the output, but not to format the
>> actual number (i.e. all digits are still there anyway)? Is this case a bug,
>> or a feature? And if the latter, is it documented anywhere? I couldn't see
>> any hint of it in ?format, or ?options... The use of 'trim = TRUE' to fix
>> the problem seems to me like a workaround, not a real solution...
>>
>> Lastly, should I report this somewhere else?
>>
>> Thanks for your comment,
>> Mathieu.
>>
>>
>> Le 08/01/2013 12:36 PM, William Dunlap a écrit :
>>> I see the problem on both Linux and Windows, R-3.0.1.
>>>     >  vapply(as.numeric(9994:9995), function(x)format(x, scientific=FALSE, digits=3), "")
>>>     [1] "9994"  " 9995"
>>>     > vapply(as.numeric(99994:99995), function(x)format(x, scientific=FALSE, digits=4),
>> "")
>>>     [1] "99994"  " 99995"
>>>     > vapply(as.numeric(999994:999995), function(x)format(x, scientific=FALSE, digits=5),
>> "")
>>>     [1] "999994"  " 999995"
>>>
>>> The ones with the initial space are the ones that would round up to the next power of
>> 10 when
>>> rounded to the requested number of significant digits:
>>>     > x <- as.numeric(1:5e5)
>>>     > z <- vapply(x, function(x)format(x, scientific=FALSE, digits=3), "")
>>>     > i <- grep(" ", z)
>>>     > z[i]
>>>      [1] " 9995"  " 9996"  " 9997"  " 9998"  " 9999"  " 99950" " 99951" " 99952"
>>>      [9] " 99953" " 99954" " 99955" " 99956" " 99957" " 99958" " 99959" " 99960"
>>>     [17] " 99961" " 99962" " 99963" " 99964" " 99965" " 99966" " 99967" " 99968"
>>>     [25] " 99969" " 99970" " 99971" " 99972" " 99973" " 99974" " 99975" " 99976"
>>>     [33] " 99977" " 99978" " 99979" " 99980" " 99981" " 99982" " 99983" " 99984"
>>>     [41] " 99985" " 99986" " 99987" " 99988" " 99989" " 99990" " 99991" " 99992"
>>>     [49] " 99993" " 99994" " 99995" " 99996" " 99997" " 99998" " 99999"
>>>     > print(x[i], digits=3)
>>>      [1] 1e+04 1e+04 1e+04 1e+04 1e+04 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
>>>     [13] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
>>>     [25] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
>>>     [37] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
>>>     [49] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf
>>>> Of Mathieu Basille
>>>> Sent: Thursday, August 01, 2013 8:31 AM
>>>> To: R help
>>>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
>>>>
>>>> This problem does not seem to be widely popular, but at least affects two
>>>> users (both on Linux, maybe a hint here?). To me, it looks like a bug (is
>>>> it a R bug, or a OS-related bug, I don't know). Should I forward it to
>>>> R-devel, or some other place where R gurus may have a chance to look at it?
>>>>
>>>> Mathieu.
>>>>
>>>>
>>>> Le 07/30/2013 02:34 PM, arun a écrit :
>>>>> Hi Mathieu
>>>>> yes, the original problem occurs in my system too. I am using R 3.0.1 on linux mint
>> 15.  I
>>>> guess the default case would be trim=FALSE, but still it looks very strange especially in
>>>> ?apply(), as it starts from " 99995" onwards.
>>>>>
>>>>> sessionInfo()
>>>>> R version 3.0.1 (2013-05-16)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>     [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>>>>     [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>>>>     [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>>>>>     [7] LC_PAPER=C                 LC_NAME=C
>>>>>     [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>> [1] stringr_0.6.2  reshape2_1.2.2
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] plyr_1.8    tools_3.0.1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> From: Mathieu Basille <basille.web at ase-research.org>
>>>>> To: arun <smartpink111 at yahoo.com>
>>>>> Cc: R help <r-help at r-project.org>
>>>>> Sent: Tuesday, July 30, 2013 2:29 PM
>>>>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
>>>>>
>>>>> Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms
>>>>> of the problem, and this is the solution I'm currently using. However, it
>>>>> does not help to understand what the problem is, and what is the cause of it.
>>>>>
>>>>> Can you confirm that the original problem also occurs on your computer (and
>>>>> what is your OS)? It would be interesting since David is not able to
>>>>> reproduce the problem with Mac OS X.
>>>>> Mathieu.
>>>>>
>>>>>
>>>>> Le 07/30/2013 02:15 PM, arun a écrit :
>>>>>> Hi,
>>>>>> Try using trim=TRUE, in ?format()
>>>>>> options(digits=4)
>>>>>>
>>>>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>>>>       df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], trim=TRUE,scientific =
>> FALSE))
>>>>>>        df2$id2[99990:100010]
>>>>>> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>>>>> # [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
>>>>>> #[17] "100006" "100007" "100008" "100009" "100010"
>>>>>>
>>>>>>
>>>>>> id2 <- format(1:110000, scientific = FALSE,trim=TRUE)
>>>>>> id2[99990:100010]
>>>>>> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>>>>>       #[9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004"
>> "100005"
>>>>>> #[17] "100006" "100007" "100008" "100009" "100010"
>>>>>> A.K.
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> From: Mathieu Basille <basille.web at ase-research.org>
>>>>>> To: David Winsemius <dwinsemius at comcast.net>
>>>>>> Cc: r-help at r-project.org
>>>>>> Sent: Tuesday, July 30, 2013 2:07 PM
>>>>>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
>>>>>>
>>>>>> Thanks David for your interest. I have to admit that your answer puzzles me
>>>>>> even more than before. It seems that the underlying problem is way beyond
>>>>>> my R skills...
>>>>>>
>>>>>> The generation of id2 is indeed quite demanding, especially compared to a
>>>>>> simple 'as.character' call. Anyway, since it seems to be system specific,
>>>>>> here is the sessionInfo() that I forgot to attach to my first message:
>>>>>>
>>>>>> R version 3.0.1 (2013-05-16)
>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>
>>>>>> locale:
>>>>>>        [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>>>>>>        [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>>>>>>        [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>>>>>>        [7] LC_PAPER=C                 LC_NAME=C
>>>>>>        [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> In brief: last stable R available under Debian Testing... Hopefully this
>>>>>> can help tracking down the problem.
>>>>>> Mathieu.
>>>>>>
>>>>>>
>>>>>> Le 07/30/2013 01:58 PM, David Winsemius a écrit :
>>>>>>>
>>>>>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>>>>>>>
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> Here is a simple example in which the behaviour of 'format' does not make sense
>> to
>>>> me. I have read the documentation and searched the archives, but nothing pointed
>> me in
>>>> the right direction to understand this behaviour. Let's start with a simple data frame:
>>>>>>>>
>>>>>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>>>>>>
>>>>>>>> Let's now create a new variable 'id2' which is the character representation of 'id'.
>>>> Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are
>> not
>>>> formatted using their scientific representation (in this case 1e+05):
>>>>>>>>
>>>>>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>>>>>>>
>>>>>>>> Let's have a look at part of the result:
>>>>>>>>
>>>>>>>> df1$id2[99990:100010]
>>>>>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
>>>>>>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
>>>>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>>>>>
>>>>>>> Some formating processes are carried out by system functions. In this case I am
>>>> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>>>>>>>
>>>>>>>> df1$id2[99990:100010]
>>>>>>>        [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>>>>>>        [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004"
>> "100005"
>>>>>>> [17] "100006" "100007" "100008" "100009" "100010"
>>>>>>>
>>>>>>> (I did notice that generation of the id2 variable seemed to take an inordinately
>> long
>>>> time.)
>>>>>>>
>>>>>>> -- David.
>>>>>>>>
>>>>>>>> So far, so good. Let's now play with the 'digits' option:
>>>>>>>>
>>>>>>>> options(digits = 4)
>>>>>>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>>>>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>>>>>>> df2$id2[99990:100010]
>>>>>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
>>>>>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>>>>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>>>>>>
>>>>>>>> Notice the extra leading space from 99995 to 99999? To make sure it only
>>>> happened there:
>>>>>>>>
>>>>>>>> df2$id2[which(df1$id2 != df2$id2)]
>>>>>>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
>>>>>>>>
>>>>>>>> And just to make sure it only occurs in a 'apply' call, here is the same directly on a
>>>> numeric vector:
>>>>>>>>
>>>>>>>> id2 <- format(1:110000, scientific = FALSE)
>>>>>>>> id2[99990:100010]
>>>>>>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
>>>>>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>>>>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>>>>>>
>>>>>>>> Here the leading spaces are for every number, which makes sense to me. Is there
>>>> anything I'm misinterpreting in the behaviour of 'format'?
>>>>>>>> Thanks in advance for any hint,
>>>>>>>> Mathieu.
>>>>>>>>
>>>>>>>>
>>>>>>>> PS: Some background for this question. It all comes from a Rmd document, that
>>>> knitr consistently failed to process, while the R code was fine using batch or
>> interactive
>>>> R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R,
>> which
>>>> made one of my function throw an error with knitr, but not with batch or interactive
>> R. I
>>>> managed to solve the problem using 'trim = TRUE' in 'format', but I still do not
>>>> understand what's going on...
>>>>>>>> If you're interested, see here for more details on the original problem:
>>>> http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-
>>>> behaviour/17872176
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> ~$ whoami
>>>>>>>> Mathieu Basille, PhD
>>>>>>>>
>>>>>>>> ~$ locate --details
>>>>>>>> University of Florida \\
>>>>>>>> Fort Lauderdale Research and Education Center
>>>>>>>> (+1) 954-577-6314
>>>>>>>> http://ase-research.org/basille
>>>>>>>>
>>>>>>>> ~$ fortune
>>>>>>>> « Le tout est de tout dire, et je manque de mots
>>>>>>>> Et je manque de temps, et je manque d'audace. »
>>>>>>>> -- Paul Éluard
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>> David Winsemius
>>>>>>> Alameda, CA, USA
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>>>>>>>
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> Here is a simple example in which the behaviour of 'format' does not make sense
>> to
>>>> me. I have read the documentation and searched the archives, but nothing pointed
>> me in
>>>> the right direction to understand this behaviour. Let's start with a simple data frame:
>>>>>>>>
>>>>>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>>>>>>>
>>>>>>>> Let's now create a new variable 'id2' which is the character representation of 'id'.
>>>> Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are
>> not
>>>> formatted using their scientific representation (in this case 1e+05):
>>>>>>>>
>>>>>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>>>>>>>
>>>>>>>> Let's have a look at part of the result:
>>>>>>>>
>>>>>>>> df1$id2[99990:100010]
>>>>>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
>>>>>>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
>>>>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>>>>>>
>>>>>>> Some formating processes are carried out by system functions. In this case I am
>>>> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>>>>>>>
>>>>>>>> df1$id2[99990:100010]
>>>>>>>         [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>>>>>>>         [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004"
>> "100005"
>>>>>>> [17] "100006" "100007" "100008" "100009" "100010"
>>>>>>>
>>>>>>> (I did notice that generation of the id2 variable seemed to take an inordinately
>> long
>>>> time.)
>>>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.

-- 

~$ whoami
Mathieu Basille, PhD

~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314
http://ase-research.org/basille

~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
  -- Paul Éluard



More information about the R-help mailing list