[Rd] Randomness not due to seed
Dirk Eddelbuettel
edd at debian.org
Wed Jul 20 18:20:49 CEST 2011
On 20 July 2011 at 18:02, peter dalgaard wrote:
|
| On Jul 20, 2011, at 15:38 , Dirk Eddelbuettel wrote:
|
| >
| > On 20 July 2011 at 14:03, Jeroen Ooms wrote:
| > | >> I think Bill Dunlap's answer addressed it: the claim appears to be false.
| > |
| > | Here is another example where there is randomness that is not due to
| > | the seed. On the same machine, the same R binary, but through another
| > | interface. First directly in the shell:
| > |
| > | > sessionInfo()
| > | R version 2.13.1 (2011-07-08)
| > | Platform: i686-pc-linux-gnu (32-bit)
| > |
| > | locale:
| > | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
| > | [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
| > | [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
| > | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
| > | [9] LC_ADDRESS=C LC_TELEPHONE=C
| > | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
| > |
| > | attached base packages:
| > | [1] stats graphics grDevices utils datasets methods base
| > |
| > | > set.seed(123)
| > | > print(coef(lm(dist~speed, data=cars)),digits=22)
| > | (Intercept) speed
| > | -17.579094890510951643137 3.932408759124087715975
| >
| > That's PBKAC --- even double precision does NOT get you 22 digits precision.
|
| Hmm, yes, but you would expect the SAME function on the SAME data to yield the same floating point number, and give the SAME printout on the SAME R on the SAME hardware...
|
| FWIW all the Mac versions that I can access give the same results as the eclipse version.
|
| Let's look at the numbers side-by-side
|
| -17.579094890510951643137 3.932408759124087715975
| -17.57909489051087703615 3.93240875912408460735
| ! !
| 12.345678901234567890123 1.234567890123456789012
|
| so we're seeing differences around the 15th/16th significant digit. This is consistent with a difference of about one unit of least precision in the actual objects, but there could conceivably be other explanations, e.g. the print() function picking up random garbage. Jeroen: Could you save() the results from the two cases, load() them in a new session and compute the difference?
Yes 15 to 16 is common. I should have added that to my post when I said '22
is too much'. And I did not want to give the impression that nine is what one
gets, nine is the minimum as per the libc docs I quoted but as you
illustrate, 15 to 16 can often be had.
Thanks for the follow-up.
Dirk
| > You may want to read up on 'what every computer scientist should know about
| > floating point arithmetic' by Goldberg (which is both a true internet classic)
| > and ponder why a common setting for the various 'epsilon' settings of general
| > convergence is set to of the constants supplied by the OS and/or its C
| > library. R has
| >
| > #define SINGLE_EPS FLT_EPSILON
| > [...]
| > #define DOUBLE_EPS DBL_EPSILON
| >
| > in Constants.h. You can then chase the definition of FLT_EPSILON and
| > DBL_EPSILON through your system headers (which is a good exercise).
| >
| > One place you may end up in the manual -- the following from the GNU libc
| > documentationon :Floating Point Parameters"
| >
| > FLT_EPSILON
| > This is the minimum positive floating point number of type float such that
| > 1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 1E-5.
| >
| > DBL_EPSILON
| > LDBL_EPSILON
| > These are similar to FLT_EPSILON, but for the data types double and long
| > double, respectively. The type of the macro's value is the same as the type
| > it describes. The values are not supposed to be greater than 1E-9.
| >
| > So there -- nine digits.
| >
| > Dirk
| >
| >
| > | # And this is through eclipse (java)
| > |
| > | > sessionInfo()
| > | R version 2.13.1 (2011-07-08)
| > | Platform: i686-pc-linux-gnu (32-bit)
| > |
| > | locale:
| > | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
| > | [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
| > | [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
| > | [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
| > | [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
| > | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
| > |
| > | attached base packages:
| > | [1] stats graphics grDevices utils datasets methods base
| > |
| > | other attached packages:
| > | [1] rj_0.5.2-1
| > |
| > | loaded via a namespace (and not attached):
| > | [1] rJava_0.9-1 tools_2.13.1
| > |
| > | > set.seed(123)
| > | > print(coef(lm(dist~speed, data=cars)),digits=22)
| > | (Intercept) speed
| > |
|
| > |
| > | ______________________________________________
| > | R-devel at r-project.org mailing list
| > | https://stat.ethz.ch/mailman/listinfo/r-devel
| >
| > --
| > Gauss once played himself in a zero-sum game and won $50.
| > -- #11 at http://www.gaussfacts.com
| >
| > ______________________________________________
| > R-devel at r-project.org mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-devel
|
| --
| Peter Dalgaard
| Center for Statistics, Copenhagen Business School
| Solbjerg Plads 3, 2000 Frederiksberg, Denmark
| Phone: (+45)38153501
| Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
|
--
Gauss once played himself in a zero-sum game and won $50.
-- #11 at http://www.gaussfacts.com
More information about the R-devel
mailing list