[Rd] [External] Re: rpois(9, 1e10)
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jan 20 10:26:47 CET 2020
>>>>> Spencer Graves
>>>>> on Sun, 19 Jan 2020 21:35:04 -0600 writes:
> Thanks to Luke and Avi for their comments. I wrapped "round" around the
> call to "rnorm" inside my "rpois.". For "lambda" really big, that
> "round" won't do anything. However, it appears to give integers in
> floating point representation that are larger than
> .Machine$integer.max. That sounds very much like what someone would
> want. Spencer
Coming late here -- after enjoying a proper weekend ;-) --
I have been agreeing (with Spencer, IIUC) on this for a long
time (~ 3 yrs, or more?), namely that I've come to see it as a
"design bug" that rpois() {and similar} must return return typeof() "integer".
More strongly, I'm actually pretty convinced they should return
(integer-valued) double instead of NA_integer_ and for that
reason should always return double:
Even if we have (hopefully) a native 64bit integer in R,
2^64 is still teeny tiny compared .Machine$double.max
(and then maybe we'd have .Machine$longdouble.max which would
be considerably larger than double.max unless on Windows, where
the wise men at Microsoft decided to keep their workload simple
by defining "long double := double" - as 'long double'
unfortunately is not well defined by C standards)
Martin
> On 2020-01-19 21:00, Tierney, Luke wrote:
>> R uses the C 'int' type for its integer data and that is pretty much
>> universally 32 bit these days. In fact R wont' compile if it is not.
>> That means the range for integer data is the integers in [-2^31,
>> +2^31).
>>
>> It would be good to allow for a larger integer range for R integer
>> objects, and several of us are thinking about how me might get there.
>> But it isn't easy to get right, so it may take some time. I doubt
>> anything can happen for R 4.0.0 this year, but 2021 may be possible.
>>
>> I few notes inline below:
>>
>> On Sun, 19 Jan 2020, Spencer Graves wrote:
>>
>>> On my Mac:
>>>
>>>
>>> str(.Machine)
>>> ...
>>> $ integer.max : int 2147483647
>>> $ sizeof.long : int 8
>>> $ sizeof.longlong : int 8
>>> $ sizeof.longdouble : int 16
>>> $ sizeof.pointer : int 8
>>>
>>>
>>> On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
>>> the same as on my Mac.
>> One of many annoyances of Windows -- done for compatibility with
>> ancient Window apps.
>>
>>> Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
>>> log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called
>>> double precision (2 words of 4 bytes each)? And $ sizeof.longdouble =
>>> 16 = 4 words of 4 bytes each?
>> double precision is a floating point concept, not related to integers.
>>
>> If you want to figure out whether you are running a 32 bit or 64 bit R
>> look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.
>>
>> Best,
>>
>> luke
>>
>>
>>>
>>> Spencer
>>>
>>>
>>> On 2020-01-19 15:41, Avraham Adler wrote:
>>>> Floor (maybe round) of non-negative numerics, though. Poisson should
>>>> never have anything after decimal.
>>>>
>>>> Still think it’s worth allowing long long for R64 bit, just for purity
>>>> sake.
>>>>
>>>> Avi
>>>>
>>>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>>>> <spencer.graves using prodsyse.com <mailto:spencer.graves using prodsyse.com>> wrote:
>>>>
>>>>
>>>>
>>>> On 2020-01-19 13:01, Avraham Adler wrote:
>>>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>>>> sum, can you break your “big” simulation into the sum of a few
>>>>> smaller ones? Or is the order of magnitude difference just too great?
>>>>
>>>> I don't perceive that as feasible. Once I found what was
>>>> generating NAs, it was easy to code a function to return
>>>> pseudo-random numbers using the standard normal approximation to
>>>> the Poisson for those extreme cases. [For a Poisson with mean =
>>>> 1e6, for example, the skewness (third standardized moment) is
>>>> 0.001. At least for my purposes, that should be adequate.][1]
>>>>
>>>>
>>>> What are the negative consequences of having rpois return
>>>> numerics that are always nonnegative?
>>>>
>>>>
>>>> Spencer
>>>>
>>>>
>>>> [1] In the code I reported before, I just changed the threshold
>>>> of 1e6 to 0.5*.Machine$integer.max. On my Mac,
>>>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>>>> that a Poisson distributed pseudo-random number just under that
>>>> would have to be over 23000 standard deviations above the mean to
>>>> exceed .Machine$integer.max.
>>>>
>>>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>>>> <spencer.graves using prodsyse.com
>>>>> <mailto:spencer.graves using prodsyse.com>> wrote:
>>>>>
>>>>> This issue arose for me in simulations to estimate
>>>>> confidence, prediction, and tolerance intervals from glm(.,
>>>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>>>> simulate.bic.glm function I added to the development version
>>>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun"
>>>>> <https://github.com/sbgraves237/Ecfun>. This is part of a
>>>>> vignette I'm developing, available at
>>>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>>>>> <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>>>>> This includes a simulated mean of a mixture of Poissons that
>>>>> exceeds 2e22. It doesn't seem unreasonable to me to have
>>>>> rpois output a numerics rather than integers when a number
>>>>> simulated exceeds .Machine$integer.max. And it does seem to
>>>>> make less sense in such cases to return NAs.
>>>>>
>>>>>
>>>>> Alternatively, might it make sense to add another
>>>>> argument to rpois to give the user the choice? E.g., an
>>>>> argument "bigOutput" with (I hope) default = "numeric" and
>>>>> "NA" as a second option. Or NA is the default, so no code
>>>>> that relied that feature of the current code would be broken
>>>>> by the change. If someone wanted to use arbitrary precision
>>>>> arithmetic, they could write their own version of this
>>>>> function with "arbitraryPrecision" as an optional value for
>>>>> the "bigOutput" argument.
>>>>>
>>>>>
>>>>> Comments?
>>>>> Thanks,
>>>>> Spencer Graves
>>>>>
>>>>>
>>>>>
>>>>> On 2020-01-19 10:28, Avraham Adler wrote:
>>>>> Technically, lambda can always be numeric. It is the
>>>>> observations which must be integral.
>>>>>>
>>>>> Would hitting everything larger than maxint or maxlonglong
>>>>> with floor or round fundamentally change the distribution?
>>>>> Well, yes, but enough that it would matter over process risk?
>>>>>>
>>>>> Avi
>>>>>>
>>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>>>> <btyner using gmail.com <mailto:btyner using gmail.com>> wrote:
>>>>>>
>>>>> So imagine rpois is changed, such that the storage mode
>>>>> of its return
>>>>> value is sometimes integer and sometimes numeric. Then
>>>>> imagine the case
>>>>> where lambda is itself a realization of a random
>>>>> variable. Do we really
>>>>> want the storage mode to inherit that randomness?
>>>>>>
>>>>>>
>>>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>>>> > Maybe there should be code for 64 bit R to use long
>>>>> long or the like?
>>>>> >
>>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>>>> > <spencer.graves using prodsyse.com
>>>>> <mailto:spencer.graves using prodsyse.com>
>>>>> <mailto:spencer.graves using prodsyse.com
>>>>> <mailto:spencer.graves using prodsyse.com>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 2020-01-19 09:34, Benjamin Tyner wrote:
>>>>> > >>
>>>>> >
>>>>> ------------------------------------------------------------------------
>>>>> > >> Hello, All:
>>>>> > >>
>>>>> > >>
>>>>> > >> Consider:
>>>>> > >>
>>>>> > >>
>>>>> > >> Browse[2]> set.seed(1)
>>>>> > >> Browse[2]> rpois(9, 1e10)
>>>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>>>> > >>
>>>>> > >>
>>>>> > >> Should this happen?
>>>>> > >>
>>>>> > >>
>>>>> > >> I think that for, say, lambda>1e6,
>>>>> rpois should return
>>>>> > rnorm(.,
>>>>> > >> lambda, sqrt(lambda)).
>>>>> > > But need to implement carefully; rpois should
>>>>> always return a
>>>>> > > non-negative integer, whereas rnorm always
>>>>> returns numeric...
>>>>> > >
>>>>> >
>>>>> > Thanks for the reply.
>>>>> >
>>>>> >
>>>>> > However, I think it's not acceptable to get
>>>>> an NA from a
>>>>> > number
>>>>> > that cannot be expressed as an integer. Whenever
>>>>> a randomly
>>>>> > generated
>>>>> > number would exceed .Machine$integer.max, the
>>>>> choice is between
>>>>> > returning NA or a non-integer numeric. Consider:
>>>>> >
>>>>> >
>>>>> > > 2*.Machine$integer.max
>>>>> > [1] 4294967294
>>>>> > > as.integer(2*.Machine$integer.max)
>>>>> > [1] NA
>>>>> > Warning message:
>>>>> > NAs introduced by coercion to integer range
>>>>> >
>>>>> >
>>>>> > I'd rather have the non-integer numeric.
>>>>> >
>>>>> >
>>>>> > Spencer
>>>>> >
>>>>> > ______________________________________________
>>>>> > R-devel using r-project.org <mailto:R-devel using r-project.org>
>>>>> <mailto:R-devel using r-project.org
>>>>> <mailto:R-devel using r-project.org>> mailing list
>>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> >
>>>>> > --
>>>>> > Sent from Gmail Mobile
>>>>>>
>>>>> --
>>>>> Sent from Gmail Mobile
>>>>> --
>>>>> Sent from Gmail Mobile
>>>> --
>>>> Sent from Gmail Mobile
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list