[R] approxfun-problems (yleft and yright ignored)
Martin Maechler
maechler at stat.math.ethz.ch
Sat Sep 11 17:13:31 CEST 2010
>>>>> Duncan Murdoch <murdoch.duncan at gmail.com>
>>>>> on Sat, 11 Sep 2010 10:32:38 -0400 writes:
> On 11/09/2010 10:04 AM, Martin Maechler wrote:
>>>>>>> "SW" == Samuel Wuest <wuests at tcd.ie>
>>>>>>> on Thu, 26 Aug 2010 14:34:26 +0100 writes:
>>
SW> Hi Greg,
SW> thanks for the suggestion:
>>
SW> I have attached some small dataset that can be used to reproduce the
SW> odd behavior of the approxfun-function.
>>
SW> If it gets stripped off my email, it can also be downloaded at:
SW> http://bioinf.gen.tcd.ie/approx.data.Rdata
>>
SW> Strangely, the problem seems specific to the data structure in my
SW> expression set, when I use simulated data, everything worked fine.
>>
SW> Here is some code that I run and resulted in the strange output that I
SW> have described in my initial post:
>>
>> >> ### load the data: a list called approx.data
>> >> load(file="approx.data.Rdata")
>> >> ### contains the slots "x", "y", "input"
>> >> names(approx.data)
SW> [1] "x" "y" "input"
>> >> ### with y ranging between 0 and 1
>> >> range(approx.data$y)
SW> [1] 0 1
>> >> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points):
>> >> range(approx.data$x)
SW> [1] 3.098444 7.268812
>> >> range(approx.data$input)
SW> [1] 3.329408 13.026700
>> >>
>> >>
>> >> ### generate the interpolation function (warning message benign)
>> >> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
SW> Warning message:
SW> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0, :
SW> collapsing to unique 'x' values
>> >>
>> >> ### apply to input-values
>> >> y.out <- sapply(approx.data$input, interp)
>> >>
>> >> ### still I find output values >1, even though yleft=1:
>> >> range(y.out)
SW> [1] 0.000000 7.207233
>>
>>
>> I get completely different (and correct) results,
>> by the way the *same* you have in the bug report you've
>> submitted
>> (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14377)
>> and which does *not* show any bug:
>>
>>> range(y.out)
>> [1] 0.0000000 0.9816907
>>
>> Of course, I do believe that you've seen the above problems,
>> -- on 64-bit Mac ? as you report in sessionInfo() ? --
>> but I cannot reproduce them.
>>
>> And also, you seem yourself to be able to get different results
>> for the same data... what are the circumstances?
> I think Greg Snow's analysis (linked from the bug report) is right.
> length(unique(x)) can be different than length(levels(as.factor(x)))
> (which is what tapply uses to determine the number of levels).
> I've fixed it by replacing
> y <- as.vector(tapply(y,x,ties))
> with
> y <- as.vector(tapply(y,match(x,x),ties))
> since match() uses the same rules as unique.
> An argument could be made that tapply should be changed instead, but it
> is behaving as documented, so I didn't want to do that.
I think you've worked around the problem (and I had seen Greg's
analysis), thank you. BTW, the problem wasn't there in R 2.8.x,
as we since had improved factor() ....
{.. and btw, the same changes should happen to
approx(), spline(), splinefun() ..}
*However*
why on earth can the results become *random*, i.e.,
differ from one call to the other, with identical data in the
same R session?
I currently guess that sometimes the operands (of the linear
interpolation rational arithmetic) are kept in high-precision
registers and sometimes not, I must say I am puzzled that this
happens dynamically rather than determined by the exact compiler
flags at (R's) build time
??
(and yes, this thread should have been on R-devel rather than R-help
from the start ...)
Martin
More information about the R-help
mailing list