[R] approxfun-problems (yleft and yright ignored)

Sat Sep 11 17:13:31 CEST 2010

>>>>> Duncan Murdoch <murdoch.duncan at gmail.com>
>>>>>     on Sat, 11 Sep 2010 10:32:38 -0400 writes:

    > On 11/09/2010 10:04 AM, Martin Maechler wrote:
    >>>>>>> "SW" == Samuel Wuest <wuests at tcd.ie>
    >>>>>>> on Thu, 26 Aug 2010 14:34:26 +0100 writes:
    >> 
    SW> Hi Greg,
    SW> thanks for the suggestion:
    >> 
    SW> I have attached some small dataset that can be used to reproduce the
    SW> odd behavior of the approxfun-function.
    >> 
    SW> If it gets stripped off my email, it can also be downloaded at:
    SW> http://bioinf.gen.tcd.ie/approx.data.Rdata
    >> 
    SW> Strangely, the problem seems specific to the data structure in my
    SW> expression set, when I use simulated data, everything worked fine.
    >> 
    SW> Here is some code that I run and resulted in the strange output that I
    SW> have described in my initial post:
    >> 
    >> >> ### load the data: a list called approx.data
    >> >> load(file="approx.data.Rdata")
    >> >> ### contains the slots "x", "y", "input"
    >> >> names(approx.data)
    SW> [1] "x"     "y"     "input"
    >> >> ### with y ranging between 0 and 1
    >> >> range(approx.data$y)
    SW> [1] 0 1
    >> >> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points):
    >> >> range(approx.data$x)
    SW> [1] 3.098444 7.268812
    >> >> range(approx.data$input)
    SW> [1]  3.329408 13.026700
    >> >> 
    >> >> 
    >> >> ### generate the interpolation function (warning message benign)
    >> >> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
    SW> Warning message:
    SW> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
    SW> collapsing to unique 'x' values
    >> >> 
    >> >> ### apply to input-values
    >> >> y.out <- sapply(approx.data$input, interp)
    >> >> 
    >> >> ### still I find output values >1, even though yleft=1:
    >> >> range(y.out)
    SW> [1] 0.000000 7.207233
    >> 
    >> 
    >> I get completely different (and correct) results,
    >> by the way the *same* you have in the bug report you've
    >> submitted 
    >> (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14377)
    >> and which does *not* show any bug:
    >> 
    >>> range(y.out)
    >> [1] 0.0000000 0.9816907
    >> 
    >> Of course, I do believe that you've seen the above problems,
    >> -- on 64-bit Mac ? as you report in sessionInfo() ? --
    >> but I cannot reproduce them.
    >> 
    >> And also, you seem yourself to be able to get different results
    >> for the same data... what are the circumstances?

    > I think Greg Snow's analysis (linked from the bug report) is right. 
    > length(unique(x)) can be different than length(levels(as.factor(x))) 
    > (which is what tapply uses to determine the number of levels).

    > I've fixed it by replacing

    > y <- as.vector(tapply(y,x,ties))

    > with

    > y <- as.vector(tapply(y,match(x,x),ties))

    > since match() uses the same rules as unique.

    > An argument could be made that tapply should be changed instead, but it 
    > is behaving as documented, so I didn't want to do that.

I think you've worked around the problem (and I had seen Greg's
analysis), thank you. BTW, the problem wasn't there in R 2.8.x,
as we since had improved  factor()  .... 

{.. and btw, the same changes should happen to
  approx(), spline(), splinefun() ..}

*However*
why on earth can the results become *random*, i.e.,
differ from one call to the other, with identical data in the
same R session?
I currently guess that sometimes the operands (of the linear
interpolation rational arithmetic) are kept in high-precision
registers and sometimes not, I must say I am puzzled that this
happens dynamically rather than determined by the exact compiler
flags at (R's) build time

??

(and yes, this thread should have been on R-devel rather than R-help
 from the start ...)

Martin