[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

William Dunlap wdunlap at tibco.com
Thu Aug 1 19:47:55 CEST 2013


You could report it as a bug at
  https://bugs.r-project.org/bugzilla3/

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: Mathieu Basille [mailto:basille.web at ase-research.org]
> Sent: Thursday, August 01, 2013 10:31 AM
> To: R help
> Cc: William Dunlap
> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> 
> Nicely spotted, Bill! You went much farther than I could have. We can
> basically summarize the problem with the following simple example:
> 
>  > format(9994, digits = 3)
> [1] "9994"
>  > format(9995, digits = 3)
> [1] " 9995"
> 
> I'm still not sure why this is happening, though: The 'digits' parameter is
> used to guess the number of characters of the output, but not to format the
> actual number (i.e. all digits are still there anyway)? Is this case a bug,
> or a feature? And if the latter, is it documented anywhere? I couldn't see
> any hint of it in ?format, or ?options... The use of 'trim = TRUE' to fix
> the problem seems to me like a workaround, not a real solution...
> 
> Lastly, should I report this somewhere else?
> 
> Thanks for your comment,
> Mathieu.
> 
> 
> Le 08/01/2013 12:36 PM, William Dunlap a écrit :
> > I see the problem on both Linux and Windows, R-3.0.1.
> >    >  vapply(as.numeric(9994:9995), function(x)format(x, scientific=FALSE, digits=3), "")
> >    [1] "9994"  " 9995"
> >    > vapply(as.numeric(99994:99995), function(x)format(x, scientific=FALSE, digits=4),
> "")
> >    [1] "99994"  " 99995"
> >    > vapply(as.numeric(999994:999995), function(x)format(x, scientific=FALSE, digits=5),
> "")
> >    [1] "999994"  " 999995"
> >
> > The ones with the initial space are the ones that would round up to the next power of
> 10 when
> > rounded to the requested number of significant digits:
> >    > x <- as.numeric(1:5e5)
> >    > z <- vapply(x, function(x)format(x, scientific=FALSE, digits=3), "")
> >    > i <- grep(" ", z)
> >    > z[i]
> >     [1] " 9995"  " 9996"  " 9997"  " 9998"  " 9999"  " 99950" " 99951" " 99952"
> >     [9] " 99953" " 99954" " 99955" " 99956" " 99957" " 99958" " 99959" " 99960"
> >    [17] " 99961" " 99962" " 99963" " 99964" " 99965" " 99966" " 99967" " 99968"
> >    [25] " 99969" " 99970" " 99971" " 99972" " 99973" " 99974" " 99975" " 99976"
> >    [33] " 99977" " 99978" " 99979" " 99980" " 99981" " 99982" " 99983" " 99984"
> >    [41] " 99985" " 99986" " 99987" " 99988" " 99989" " 99990" " 99991" " 99992"
> >    [49] " 99993" " 99994" " 99995" " 99996" " 99997" " 99998" " 99999"
> >    > print(x[i], digits=3)
> >     [1] 1e+04 1e+04 1e+04 1e+04 1e+04 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
> >    [13] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
> >    [25] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
> >    [37] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
> >    [49] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf
> >> Of Mathieu Basille
> >> Sent: Thursday, August 01, 2013 8:31 AM
> >> To: R help
> >> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> >>
> >> This problem does not seem to be widely popular, but at least affects two
> >> users (both on Linux, maybe a hint here?). To me, it looks like a bug (is
> >> it a R bug, or a OS-related bug, I don't know). Should I forward it to
> >> R-devel, or some other place where R gurus may have a chance to look at it?
> >>
> >> Mathieu.
> >>
> >>
> >> Le 07/30/2013 02:34 PM, arun a écrit :
> >>> Hi Mathieu
> >>> yes, the original problem occurs in my system too. I am using R 3.0.1 on linux mint
> 15.  I
> >> guess the default case would be trim=FALSE, but still it looks very strange especially in
> >> ?apply(), as it starts from " 99995" onwards.
> >>>
> >>> sessionInfo()
> >>> R version 3.0.1 (2013-05-16)
> >>> Platform: x86_64-unknown-linux-gnu (64-bit)
> >>>
> >>> locale:
> >>>    [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
> >>>    [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
> >>>    [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
> >>>    [7] LC_PAPER=C                 LC_NAME=C
> >>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
> >>>
> >>> attached base packages:
> >>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>
> >>> other attached packages:
> >>> [1] stringr_0.6.2  reshape2_1.2.2
> >>>
> >>> loaded via a namespace (and not attached):
> >>> [1] plyr_1.8    tools_3.0.1
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: Mathieu Basille <basille.web at ase-research.org>
> >>> To: arun <smartpink111 at yahoo.com>
> >>> Cc: R help <r-help at r-project.org>
> >>> Sent: Tuesday, July 30, 2013 2:29 PM
> >>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> >>>
> >>> Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms
> >>> of the problem, and this is the solution I'm currently using. However, it
> >>> does not help to understand what the problem is, and what is the cause of it.
> >>>
> >>> Can you confirm that the original problem also occurs on your computer (and
> >>> what is your OS)? It would be interesting since David is not able to
> >>> reproduce the problem with Mac OS X.
> >>> Mathieu.
> >>>
> >>>
> >>> Le 07/30/2013 02:15 PM, arun a écrit :
> >>>> Hi,
> >>>> Try using trim=TRUE, in ?format()
> >>>> options(digits=4)
> >>>>
> >>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>>      df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], trim=TRUE,scientific =
> FALSE))
> >>>>       df2$id2[99990:100010]
> >>>> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>>> # [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> >>>> #[17] "100006" "100007" "100008" "100009" "100010"
> >>>>
> >>>>
> >>>> id2 <- format(1:110000, scientific = FALSE,trim=TRUE)
> >>>> id2[99990:100010]
> >>>> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>>>      #[9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004"
> "100005"
> >>>> #[17] "100006" "100007" "100008" "100009" "100010"
> >>>> A.K.
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>> From: Mathieu Basille <basille.web at ase-research.org>
> >>>> To: David Winsemius <dwinsemius at comcast.net>
> >>>> Cc: r-help at r-project.org
> >>>> Sent: Tuesday, July 30, 2013 2:07 PM
> >>>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> >>>>
> >>>> Thanks David for your interest. I have to admit that your answer puzzles me
> >>>> even more than before. It seems that the underlying problem is way beyond
> >>>> my R skills...
> >>>>
> >>>> The generation of id2 is indeed quite demanding, especially compared to a
> >>>> simple 'as.character' call. Anyway, since it seems to be system specific,
> >>>> here is the sessionInfo() that I forgot to attach to my first message:
> >>>>
> >>>> R version 3.0.1 (2013-05-16)
> >>>> Platform: x86_64-pc-linux-gnu (64-bit)
> >>>>
> >>>> locale:
> >>>>       [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
> >>>>       [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
> >>>>       [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
> >>>>       [7] LC_PAPER=C                 LC_NAME=C
> >>>>       [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >>>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
> >>>>
> >>>> attached base packages:
> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>>
> >>>> In brief: last stable R available under Debian Testing... Hopefully this
> >>>> can help tracking down the problem.
> >>>> Mathieu.
> >>>>
> >>>>
> >>>> Le 07/30/2013 01:58 PM, David Winsemius a écrit :
> >>>>>
> >>>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
> >>>>>
> >>>>>> Dear list,
> >>>>>>
> >>>>>> Here is a simple example in which the behaviour of 'format' does not make sense
> to
> >> me. I have read the documentation and searched the archives, but nothing pointed
> me in
> >> the right direction to understand this behaviour. Let's start with a simple data frame:
> >>>>>>
> >>>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>>>>
> >>>>>> Let's now create a new variable 'id2' which is the character representation of 'id'.
> >> Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are
> not
> >> formatted using their scientific representation (in this case 1e+05):
> >>>>>>
> >>>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> >>>>>>
> >>>>>> Let's have a look at part of the result:
> >>>>>>
> >>>>>> df1$id2[99990:100010]
> >>>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
> >>>>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
> >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>>>
> >>>>> Some formating processes are carried out by system functions. In this case I am
> >> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
> >>>>>
> >>>>>> df1$id2[99990:100010]
> >>>>>       [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>>>>       [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004"
> "100005"
> >>>>> [17] "100006" "100007" "100008" "100009" "100010"
> >>>>>
> >>>>> (I did notice that generation of the id2 variable seemed to take an inordinately
> long
> >> time.)
> >>>>>
> >>>>> -- David.
> >>>>>>
> >>>>>> So far, so good. Let's now play with the 'digits' option:
> >>>>>>
> >>>>>> options(digits = 4)
> >>>>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> >>>>>> df2$id2[99990:100010]
> >>>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
> >>>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>>>>
> >>>>>> Notice the extra leading space from 99995 to 99999? To make sure it only
> >> happened there:
> >>>>>>
> >>>>>> df2$id2[which(df1$id2 != df2$id2)]
> >>>>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
> >>>>>>
> >>>>>> And just to make sure it only occurs in a 'apply' call, here is the same directly on a
> >> numeric vector:
> >>>>>>
> >>>>>> id2 <- format(1:110000, scientific = FALSE)
> >>>>>> id2[99990:100010]
> >>>>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
> >>>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>>>>
> >>>>>> Here the leading spaces are for every number, which makes sense to me. Is there
> >> anything I'm misinterpreting in the behaviour of 'format'?
> >>>>>> Thanks in advance for any hint,
> >>>>>> Mathieu.
> >>>>>>
> >>>>>>
> >>>>>> PS: Some background for this question. It all comes from a Rmd document, that
> >> knitr consistently failed to process, while the R code was fine using batch or
> interactive
> >> R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R,
> which
> >> made one of my function throw an error with knitr, but not with batch or interactive
> R. I
> >> managed to solve the problem using 'trim = TRUE' in 'format', but I still do not
> >> understand what's going on...
> >>>>>> If you're interested, see here for more details on the original problem:
> >> http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-
> >> behaviour/17872176
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> ~$ whoami
> >>>>>> Mathieu Basille, PhD
> >>>>>>
> >>>>>> ~$ locate --details
> >>>>>> University of Florida \\
> >>>>>> Fort Lauderdale Research and Education Center
> >>>>>> (+1) 954-577-6314
> >>>>>> http://ase-research.org/basille
> >>>>>>
> >>>>>> ~$ fortune
> >>>>>> « Le tout est de tout dire, et je manque de mots
> >>>>>> Et je manque de temps, et je manque d'audace. »
> >>>>>> -- Paul Éluard
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-help at r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>
> >>>>> David Winsemius
> >>>>> Alameda, CA, USA
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
> >>>>>
> >>>>>> Dear list,
> >>>>>>
> >>>>>> Here is a simple example in which the behaviour of 'format' does not make sense
> to
> >> me. I have read the documentation and searched the archives, but nothing pointed
> me in
> >> the right direction to understand this behaviour. Let's start with a simple data frame:
> >>>>>>
> >>>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>>>>
> >>>>>> Let's now create a new variable 'id2' which is the character representation of 'id'.
> >> Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are
> not
> >> formatted using their scientific representation (in this case 1e+05):
> >>>>>>
> >>>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> >>>>>>
> >>>>>> Let's have a look at part of the result:
> >>>>>>
> >>>>>> df1$id2[99990:100010]
> >>>>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
> >>>>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
> >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>>>
> >>>>> Some formating processes are carried out by system functions. In this case I am
> >> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
> >>>>>
> >>>>>> df1$id2[99990:100010]
> >>>>>        [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>>>>        [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004"
> "100005"
> >>>>> [17] "100006" "100007" "100008" "100009" "100010"
> >>>>>
> >>>>> (I did notice that generation of the id2 variable seemed to take an inordinately
> long
> >> time.)
> >>>>>
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list