[R] approxfun-problems (yleft and yright ignored)

Duncan Murdoch murdoch.duncan at gmail.com
Sat Sep 11 17:08:56 CEST 2010


On 11/09/2010 10:53 AM, Martin Maechler wrote:
>>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>>     on Sat, 11 Sep 2010 16:04:37 +0200 writes:
> 
>>>>>> "SW" == Samuel Wuest <wuests at tcd.ie>
>>>>>>     on Thu, 26 Aug 2010 14:34:26 +0100 writes:
> 
>     SW> Hi Greg,
>     SW> thanks for the suggestion:
> 
>     SW> I have attached some small dataset that can be used to reproduce the
>     SW> odd behavior of the approxfun-function.
> 
>     SW> If it gets stripped off my email, it can also be downloaded at:
>     SW> http://bioinf.gen.tcd.ie/approx.data.Rdata
> 
>     SW> Strangely, the problem seems specific to the data structure in my
>     SW> expression set, when I use simulated data, everything worked fine.
> 
>     SW> Here is some code that I run and resulted in the strange output that I
>     SW> have described in my initial post:
> 
>     >>> ### load the data: a list called approx.data
>     >>> load(file="approx.data.Rdata")
>     >>> ### contains the slots "x", "y", "input"
>     >>> names(approx.data)
>     SW> [1] "x"     "y"     "input"
>     >>> ### with y ranging between 0 and 1
>     >>> range(approx.data$y)
>     SW> [1] 0 1
>     >>> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points):
>     >>> range(approx.data$x)
>     SW> [1] 3.098444 7.268812
>     >>> range(approx.data$input)
>     SW> [1]  3.329408 13.026700
>     >>> 
>     >>> 
>     >>> ### generate the interpolation function (warning message benign)
>     >>> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
>     SW> Warning message:
>     SW> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
>     SW> collapsing to unique 'x' values
>     >>> 
>     >>> ### apply to input-values
>     >>> y.out <- sapply(approx.data$input, interp)
>     >>> 
>     >>> ### still I find output values >1, even though yleft=1:
>     >>> range(y.out)
>     SW> [1] 0.000000 7.207233
> 
> 
>     MM> I get completely different (and correct) results,
>     MM> by the way the *same* you have in the bug report you've
>     MM> submitted 
>     MM> (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14377)
>     MM> and which does *not* show any bug:
> 
>     >> range(y.out)
>     MM> [1] 0.0000000 0.9816907
> 
>     MM> Of course, I do believe that you've seen the above problems,
>     MM> -- on 64-bit Mac ? as you report in sessionInfo() ? --
>     MM> but I cannot reproduce them.
> 
>     MM> And also, you seem yourself to be able to get different results
>     MM> for the same data... what are the circumstances?
> 
> I now see that you *did* mention the fact that you
> see *different* results when you *sort(levels(as.factor(approx.data$x)))[285:287]RE*run the same code
> on this data.
> The subject (" ...yleft and yright ignored") is misleading in
> any case.  These are not at all ignored...,
> but indeed (as Duncan Murdoch has noted on the bug website),
> there *is* a bug,
> so you are principally right in reporting -- thank you!
> ....
> and I could also confirm that -- as you mentioned in your first
> post -- this bug does not appear when using R 2.8.1 (at least on
> my platform).

I see this too -- and it appears to be because as.factor() finds a 
different number of levels in R-devel than it did in 2.8.1.  In 2.8.1, 
the level names are not unique, but in R-devel, they  are, so there are 
fewer of them.

In 2.8.1, I see

sort(levels(as.factor(approx.data$x)))[285:287]
[1] "3.97124402436476" "3.97124402436476" "3.97129741844245"

(notice the first two are identical), whereas in R-devel, we get

[1] "3.97124402436476" "3.97129741844245" "3.97408959567848"

from the same expression.

Duncan Murdoch

Duncan Murdoch



More information about the R-help mailing list