[R] NAs produced by integer overflow, but only some time ...
William Dunlap
wdun|@p @end|ng |rom t|bco@com
Wed May 9 17:02:41 CEST 2018
Printing a number does not show whether it is stored
as a 32-bit integer or as a 64-bit floating point value.
Use. e.g., str() or class() to see.
> str(length(runif(3)))
int 3
> str(length(runif(3)) + 1)
num 4
> str(length(runif(3)) + 1L)
int 4
> str( 3L * 3L )
int 9
> str( 3L ^ 2L )
num 9
You are right that various arithmetic operators map a pair
of integer arguments to various type: the power and division
operators map them to double precision while the the addition,
multiplication, and subtraction operators map them to integer
results (giving NA's if the result cannot fit into 32 bits).
Perhaps it was a mistake to include the integer type, but
at the time S was developed it made sense.
As for table(table(x)) being an unnatural construct, I use it
all the time instead of anyDuplicated to see the pattern of
duplications.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, May 9, 2018 at 12:04 AM, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:
> a) Numeric values may be either integers (signed 32 bit) or double
> precision (53 bit mantissa).
>
> b) Double precision constants are numeric with no decoration (e.g. 61224).
> Integer constants have an L (e.g. 61224L).
>
> c) 61224*61224 > 2^31-1 so that answer cannot fit into an integer.
>
> d) Exponentiation is a floating point operation so the result of 61224L^2L
> is a floating point answer that CAN fit into the 53bit mantissa of a double
> precision value, so no overflow occurs.
>
> e) Defining a function like yules.k1 and never showing how you called it
> does not constitute a reproducible example. To avoid such gaffes you can
> use the reprex package to confirm that the errors shown in your question
> are in fact reproducible.
>
> f) On this mailing list, the fact that you are using RStudio is at best
> irrelevant, and at worst off-topic. If you don't see problems running your
> reproducible example from R in the terminal then the question probably
> belongs in the RStudio support forum. This is another reason to use the
> reprex package to check your reproducibility (this works even if you invoke
> it from RStudio).
>
> g) Calling table on the result of table must be one of the more bizarre
> calculation sequences I have ever seen in R. I hope you are getting the
> answers you are expecting when you do use double precision numeric values.
> Also, using the prefix form of multiplication is unnecessarily obscure, and
> your use of the return function at the end of your function is redundant.
>
> On May 8, 2018 7:54:26 PM PDT, "Stefan Th. Gries" <stgries using gmail.com>
> wrote:
> >I have problem with integer overflow that I cannot understand.
> >
> >I have a character vector curr.lemmas with the following properties:
> >
> >length(curr.lemmas) # 61224
> >length(unique(curr.lemmas)) # 2652
> >
> >That vector is the input to the following function:
> >
> >yules.k1 <- function(input) {
> > m1 <- length(input); temp <- table(table(input))
> > m2 <- sum("*"(temp, as.numeric(names(temp))^2))
> > return(10000*(m2-m1) / (m1*m1))
> >}
> >
> >When I run this, I get the following output:
> >
> >[1] NA
> >Warning message:
> >In m1 * m1 : NAs produced by integer overflow
> >
> >But when I change the function to this one by just replacing m1*m1 by
> >m1^2 ...
> >
> >yules.k2 <- function(input) {
> > m1 <- length(input); temp <- table(table(input))
> > m2 <- sum("*"(temp, as.numeric(names(temp))^2))
> > return(10000*(m2-m1) / (m1^2))
> >}
> >
> >yules.k2(curr.lemmas) # -> 157.261
> >
> >I am using RStudio 1.1.447 and here's my sessionInfo
> >######################
> >R version 3.4.4 (2018-03-15)
> >Platform: x86_64-pc-linux-gnu (64-bit)
> >Running under: Linux Mint 18.3
> >
> >Matrix products: default
> >BLAS: /usr/lib/openblas-base/libblas.so.3
> >LAPACK: /usr/lib/libopenblasp-r0.2.18.so
> >
> >locale:
> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> >LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> >LC_MONETARY=en_US.UTF-8
> > [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
> > LC_ADDRESS=C LC_TELEPHONE=C
> >[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> >attached base packages:
> >[1] stats graphics grDevices utils datasets methods base
> >
> >loaded via a namespace (and not attached):
> > [1] compiler_3.4.4 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
> >htmltools_0.3.6 tools_3.4.4 yaml_2.1.19 Rcpp_0.12.16
> >stringi_1.2.2
> >[10] rmarkdown_1.9 knitr_1.20 stringr_1.3.0 digest_0.6.15
> >evaluate_0.10.1
> >######################
> >
> >What is even more puzzling is that one time I ran R in the console of
> >Geany and this happened:
> >
> >> m1
> >[1] 61224
> >> 61224*61224
> >[1] 3748378176
> >> 61224^2
> >[1] 3748378176
> >> m1*m1
> >[1] NA
> >Warning message:
> >In m1 * m1 : NAs produced by integer overflow
> >> m1^2
> >[1] 3748378176
> >
> >That is, the multiplication worked with the numbers but not the
> >numeric vectors; the above is literally copied from the console. Why
> >is that happening?
> >
> >Any help would be much appreciated!
> >STG
> >--
> >Stefan Th. Gries
> >----------------------------------
> >Univ. of California, Santa Barbara
> >http://tinyurl.com/stgries
> >
> >______________________________________________
> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list