[Rd] [External] Re: rpois(9, 1e10)
Tierney, Luke
|uke-t|erney @end|ng |rom u|ow@@edu
Mon Jan 20 04:00:58 CET 2020
R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).
It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.
I few notes inline below:
On Sun, 19 Jan 2020, Spencer Graves wrote:
> On my Mac:
>
>
> str(.Machine)
> ...
> $ integer.max : int 2147483647
> $ sizeof.long : int 8
> $ sizeof.longlong : int 8
> $ sizeof.longdouble : int 16
> $ sizeof.pointer : int 8
>
>
> On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
> the same as on my Mac.
One of many annoyances of Windows -- done for compatibility with
ancient Window apps.
> Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
> log2(.Machine$integer.max) = 31. Then 8 bytes is what used to be called
> double precision (2 words of 4 bytes each)? And $ sizeof.longdouble =
> 16 = 4 words of 4 bytes each?
double precision is a floating point concept, not related to integers.
If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.
Best,
luke
>
>
> Spencer
>
>
> On 2020-01-19 15:41, Avraham Adler wrote:
>> Floor (maybe round) of non-negative numerics, though. Poisson should
>> never have anything after decimal.
>>
>> Still think it’s worth allowing long long for R64 bit, just for purity
>> sake.
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>> <spencer.graves using prodsyse.com <mailto:spencer.graves using prodsyse.com>> wrote:
>>
>>
>>
>> On 2020-01-19 13:01, Avraham Adler wrote:
>>> Crazy thought, but being that a sum of Poissons is Poisson in the
>>> sum, can you break your “big” simulation into the sum of a few
>>> smaller ones? Or is the order of magnitude difference just too great?
>>
>>
>> I don't perceive that as feasible. Once I found what was
>> generating NAs, it was easy to code a function to return
>> pseudo-random numbers using the standard normal approximation to
>> the Poisson for those extreme cases. [For a Poisson with mean =
>> 1e6, for example, the skewness (third standardized moment) is
>> 0.001. At least for my purposes, that should be adequate.][1]
>>
>>
>> What are the negative consequences of having rpois return
>> numerics that are always nonnegative?
>>
>>
>> Spencer
>>
>>
>> [1] In the code I reported before, I just changed the threshold
>> of 1e6 to 0.5*.Machine$integer.max. On my Mac,
>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>> that a Poisson distributed pseudo-random number just under that
>> would have to be over 23000 standard deviations above the mean to
>> exceed .Machine$integer.max.
>>
>>>
>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>> <spencer.graves using prodsyse.com
>>> <mailto:spencer.graves using prodsyse.com>> wrote:
>>>
>>> This issue arose for me in simulations to estimate
>>> confidence, prediction, and tolerance intervals from glm(.,
>>> family=poisson) fits embedded in a BMA::bic.glm fit using a
>>> simulate.bic.glm function I added to the development version
>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun"
>>> <https://github.com/sbgraves237/Ecfun>. This is part of a
>>> vignette I'm developing, available at
>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>>> <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>>> This includes a simulated mean of a mixture of Poissons that
>>> exceeds 2e22. It doesn't seem unreasonable to me to have
>>> rpois output a numerics rather than integers when a number
>>> simulated exceeds .Machine$integer.max. And it does seem to
>>> make less sense in such cases to return NAs.
>>>
>>>
>>> Alternatively, might it make sense to add another
>>> argument to rpois to give the user the choice? E.g., an
>>> argument "bigOutput" with (I hope) default = "numeric" and
>>> "NA" as a second option. Or NA is the default, so no code
>>> that relied that feature of the current code would be broken
>>> by the change. If someone wanted to use arbitrary precision
>>> arithmetic, they could write their own version of this
>>> function with "arbitraryPrecision" as an optional value for
>>> the "bigOutput" argument.
>>>
>>>
>>> Comments?
>>> Thanks,
>>> Spencer Graves
>>>
>>>
>>>
>>> On 2020-01-19 10:28, Avraham Adler wrote:
>>>> Technically, lambda can always be numeric. It is the
>>>> observations which must be integral.
>>>>
>>>> Would hitting everything larger than maxint or maxlonglong
>>>> with floor or round fundamentally change the distribution?
>>>> Well, yes, but enough that it would matter over process risk?
>>>>
>>>> Avi
>>>>
>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>>> <btyner using gmail.com <mailto:btyner using gmail.com>> wrote:
>>>>
>>>> So imagine rpois is changed, such that the storage mode
>>>> of its return
>>>> value is sometimes integer and sometimes numeric. Then
>>>> imagine the case
>>>> where lambda is itself a realization of a random
>>>> variable. Do we really
>>>> want the storage mode to inherit that randomness?
>>>>
>>>>
>>>> On 1/19/20 10:47 AM, Avraham Adler wrote:
>>>> > Maybe there should be code for 64 bit R to use long
>>>> long or the like?
>>>> >
>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>>> > <spencer.graves using prodsyse.com
>>>> <mailto:spencer.graves using prodsyse.com>
>>>> <mailto:spencer.graves using prodsyse.com
>>>> <mailto:spencer.graves using prodsyse.com>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On 2020-01-19 09:34, Benjamin Tyner wrote:
>>>> > >>
>>>> >
>>>> ------------------------------------------------------------------------
>>>> > >> Hello, All:
>>>> > >>
>>>> > >>
>>>> > >> Consider:
>>>> > >>
>>>> > >>
>>>> > >> Browse[2]> set.seed(1)
>>>> > >> Browse[2]> rpois(9, 1e10)
>>>> > >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>>> > >>
>>>> > >>
>>>> > >> Should this happen?
>>>> > >>
>>>> > >>
>>>> > >> I think that for, say, lambda>1e6,
>>>> rpois should return
>>>> > rnorm(.,
>>>> > >> lambda, sqrt(lambda)).
>>>> > > But need to implement carefully; rpois should
>>>> always return a
>>>> > > non-negative integer, whereas rnorm always
>>>> returns numeric...
>>>> > >
>>>> >
>>>> > Thanks for the reply.
>>>> >
>>>> >
>>>> > However, I think it's not acceptable to get
>>>> an NA from a
>>>> > number
>>>> > that cannot be expressed as an integer. Whenever
>>>> a randomly
>>>> > generated
>>>> > number would exceed .Machine$integer.max, the
>>>> choice is between
>>>> > returning NA or a non-integer numeric. Consider:
>>>> >
>>>> >
>>>> > > 2*.Machine$integer.max
>>>> > [1] 4294967294
>>>> > > as.integer(2*.Machine$integer.max)
>>>> > [1] NA
>>>> > Warning message:
>>>> > NAs introduced by coercion to integer range
>>>> >
>>>> >
>>>> > I'd rather have the non-integer numeric.
>>>> >
>>>> >
>>>> > Spencer
>>>> >
>>>> > ______________________________________________
>>>> > R-devel using r-project.org <mailto:R-devel using r-project.org>
>>>> <mailto:R-devel using r-project.org
>>>> <mailto:R-devel using r-project.org>> mailing list
>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> >
>>>> > --
>>>> > Sent from Gmail Mobile
>>>>
>>>> --
>>>> Sent from Gmail Mobile
>>>
>>> --
>>> Sent from Gmail Mobile
>>
>> --
>> Sent from Gmail Mobile
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney using uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list