[R] approxfun-problems (yleft and yright ignored)

Greg Snow Greg.Snow at imail.org
Thu Aug 26 22:43:51 CEST 2010


It looks like you have found a bug, I can confirm that with your data on my computer that I am getting nonsense results for some cases.  I even found that when calling the function on element 164 of the input vector that I don't even get consistent results, I ran it several times and many times got different results.

I looked at the code for the function and I don't think that the problem is in the R code portion, but is in the compiled C portion.  You should send this in as a bug report.


One note on your code, you don't need sapply, you could just say interp(approx.data$input) and it would give the vector of interpolations.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: Samuel Wuest [mailto:wuests at tcd.ie]
> Sent: Thursday, August 26, 2010 7:34 AM
> To: Greg Snow
> Cc: r-help at r-project.org
> Subject: Re: [R] approxfun-problems (yleft and yright ignored)
> 
> Hi Greg,
> thanks for the suggestion:
> 
> I have attached some small dataset that can be used to reproduce the
> odd behavior of the approxfun-function.
> 
> If it gets stripped off my email, it can also be downloaded at:
> http://bioinf.gen.tcd.ie/approx.data.Rdata
> 
> Strangely, the problem seems specific to the data structure in my
> expression set, when I use simulated data, everything worked fine.
> 
> Here is some code that I run and resulted in the strange output that I
> have described in my initial post:
> 
> > ### load the data: a list called approx.data
> > load(file="approx.data.Rdata")
> > ### contains the slots "x", "y", "input"
> > names(approx.data)
> [1] "x"     "y"     "input"
> > ### with y ranging between 0 and 1
> > range(approx.data$y)
> [1] 0 1
> > ### compare ranges of x and input-x values (the latter is a small
> subset of 500 data points):
> > range(approx.data$x)
> [1] 3.098444 7.268812
> > range(approx.data$input)
> [1]  3.329408 13.026700
> >
> >
> > ### generate the interpolation function (warning message benign)
> > interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0,
> rule=2)
> Warning message:
> In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
>   collapsing to unique 'x' values
> >
> > ### apply to input-values
> > y.out <- sapply(approx.data$input, interp)
> >
> > ### still I find output values >1, even though yleft=1:
> > range(y.out)
> [1] 0.000000 7.207233
> > hist(y.out)
> >
> > ### and the input-data points for which strange interpolation does
> occur have no unusual distribution (however, they lie close to max(x)):
> > hist(approx.data$input[which(y.out>1)])
> 
> The session info can be found below, thanks a million for any help.
> 
> Sam
> 
> On 25 August 2010 19:31, Greg Snow <Greg.Snow at imail.org> wrote:
> > The plots did not come through, see the posting guide for which
> attachments are allowed.  It will be easier for us to help if you can
> send reproducible code (we can copy and paste to run, then examine,
> edit, etc.).  Try finding a subset of your data for which the problem
> still occurs, then send the data if possible, or similar simulated data
> if you cannot send original data.
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.snow at imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Samuel Wuest
> >> Sent: Wednesday, August 25, 2010 8:20 AM
> >> To: r-help at r-project.org
> >> Subject: [R] approxfun-problems (yleft and yright ignored)
> >>
> >> Dear all,
> >>
> >> I have run into a problem when running some code implemented in the
> >> Bioconductor panp-package (applied to my own expression data),
> whereby
> >> gene
> >> expression values of known true negative probesets (x) are
> interpolated
> >> onto
> >> present/absent p-values (y) between 0 and 1 using the *approxfun -
> >> function*{stats}; when I have used R version 2.8, everything had
> >> worked fine,
> >> however, after updating to R 2.11.1., I got unexpected output
> >> (explained
> >> below).
> >>
> >> Please correct me here, but as far as I understand, the yleft and
> >> yright
> >> arguments set the extreme values of the interpolated y-values in
> case
> >> the
> >> input x-values (on whose approxfun is applied) fall outside
> range(x).
> >> So if
> >> I run approxfun with yleft=1 and yright=0 with y-values between 0
> and
> >> 1,
> >> then I should never get any values higher than 1. However, this is
> not
> >> the
> >> case, as this code-example illustrates:
> >>
> >> > ### define the x-values used to construct the approxfun, basically
> >> these
> >> are 2000 expression values ranging from ~ 3 to 7:
> >> > xNeg <- NegExprs[, 1]
> >> > xNeg <- sort(xNeg, decreasing = TRUE)
> >> >
> >> > ### generate 2000 y-values between 0 and 1:
> >> > yNeg <- seq(0, 1, 1/(length(xNeg) - 1))
> >> > ### define yleft and yright as well as the rule to clarify what
> >> should
> >> happen if input x-values lie outside range(x):
> >> > interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2)
> >> Warning message:
> >> In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) :
> >>   collapsing to unique 'x' values
> >> > ### apply the approxfun to expression data that range from ~2.9 to
> >> 13.9
> >> and can therefore lie outside range(xNeg):
> >> >  PV <- sapply(AllExprs[, 1], interp)
> >> > range(PV)
> >> [1]    0.000 6208.932
> >> > summary(PV)
> >>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
> >> 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03
> >>
> >> So the resulting output PV object contains data ranging from 0 to
> 6208,
> >> the
> >> latter of which lies outside yleft and is not anywhere close to
> extreme
> >> y-values that were used to set up the interp-function. This seems
> wrong
> >> to
> >> me, and from what I understand, yleft and yright are simply ignored?
> >>
> >> I have attached a few histograms that visualize the data
> distributions
> >> of
> >> the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV
> (the
> >> output), so that it is easier to make sense of the data
> structures...
> >>
> >> Does anyone have an explanation for this or can tell me how to fix
> the
> >> problem?
> >>
> >> Thanks a million for any help, best, Sam
> >>
> >> > sessionInfo()
> >> R version 2.11.1 (2010-05-31)
> >> x86_64-apple-darwin9.8.0
> >>
> >> locale:
> >> [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>
> >> other attached packages:
> >> [1] panp_1.18.0   affy_1.26.1   Biobase_2.8.0
> >>
> >> loaded via a namespace (and not attached):
> >> [1] affyio_1.16.0         preprocessCore_1.10.0
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Samuel Wuest
> >> Smurfit Institute of Genetics
> >> Trinity College Dublin
> >> Dublin 2, Ireland
> >> Phone: +353-1-896 2444
> >> Web: http://www.tcd.ie/Genetics/wellmer-2/index.html
> >> Email: wuests at tcd.ie
> >> ------------------------------------------------------
> >
> >
> 
> 
> 
> --
> -----------------------------------------------------
> Samuel Wuest
> Smurfit Institute of Genetics
> Trinity College Dublin
> Dublin 2, Ireland
> Phone: +353-1-896 2444
> Web: http://www.tcd.ie/Genetics/wellmer-2/index.html
> Email: wuests at tcd.ie
> ------------------------------------------------------



More information about the R-help mailing list