[R] [EXTERNAL] R-help Digest, Vol 260, Issue 19

Jorgen Harmse JH@rm@e @end|ng |rom roku@com
Thu Oct 24 00:29:12 CEST 2024


I think that Stevie Pederson has the right idea, but it is not obvious what the threshold should be. Example:

> n <- 2428716; sum(rep(1/n,n)) - 1
[1] -3.297362e-14

I assume that equally large errors in the other direction are also possible.

Regards,
Jorgen Harmse.

----------------------------------------------------------------------

Message: 1
Date: Wed, 23 Oct 2024 15:56:00 +1030
From: Stevie Pederson <stephen.pederson.au using gmail.com>
To: r-help using r-project.org
Subject: [R] OSX-specific Bug in randomForest
Message-ID:
        <CAGCDhaXOMhreAUx=60twjtGhpRJ1NV5Xf5cdUJ7REBqn6zQ1TA using mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi,

It appears there is an OSX-specific bug in the function
`randomForest.default()` Going by the source code at
https://github.com/cran/randomForest/blob/master/R/randomForest.default.R
the bug is on line 103

If the vector `cutoff` is formed using `cutoff <- rep(1/9, 9)` (line #101)
the test on line 103 will fail on OSX as the sum is greater than 1 due to
machine precision errors.

sum(rep(1 / 9, 9)) - 1
# [1] 2.220446e-16

This will actually occur for a scenario when the number of factor levels
(nclass) is 9, 11, 18, 20 etc.The problem does not occur on Linux, and I
haven't tested on WIndows.

A suggestion may be to change the opening test

if (sum(cutoff) > 1 || ...)

to

if (sum(cutoff) - 1  > .Machine$double.eps || ...

however, I'm sure there's a more elegant way to do this

Thanks in advance

        [[alternative HTML version deleted]]


***********

	[[alternative HTML version deleted]]



More information about the R-help mailing list