[R] approxfun-problems (yleft and yright ignored)

Samuel Wuest wuests at tcd.ie
Thu Aug 26 15:34:26 CEST 2010


Hi Greg,
thanks for the suggestion:

I have attached some small dataset that can be used to reproduce the
odd behavior of the approxfun-function.

If it gets stripped off my email, it can also be downloaded at:
http://bioinf.gen.tcd.ie/approx.data.Rdata

Strangely, the problem seems specific to the data structure in my
expression set, when I use simulated data, everything worked fine.

Here is some code that I run and resulted in the strange output that I
have described in my initial post:

> ### load the data: a list called approx.data
> load(file="approx.data.Rdata")
> ### contains the slots "x", "y", "input"
> names(approx.data)
[1] "x"     "y"     "input"
> ### with y ranging between 0 and 1
> range(approx.data$y)
[1] 0 1
> ### compare ranges of x and input-x values (the latter is a small subset of 500 data points):
> range(approx.data$x)
[1] 3.098444 7.268812
> range(approx.data$input)
[1]  3.329408 13.026700
>
>
> ### generate the interpolation function (warning message benign)
> interp <- approxfun(approx.data$x, approx.data$y, yleft=1, yright=0, rule=2)
Warning message:
In approxfun(approx.data$x, approx.data$y, yleft = 1, yright = 0,  :
  collapsing to unique 'x' values
>
> ### apply to input-values
> y.out <- sapply(approx.data$input, interp)
>
> ### still I find output values >1, even though yleft=1:
> range(y.out)
[1] 0.000000 7.207233
> hist(y.out)
>
> ### and the input-data points for which strange interpolation does occur have no unusual distribution (however, they lie close to max(x)):
> hist(approx.data$input[which(y.out>1)])

The session info can be found below, thanks a million for any help.

Sam

On 25 August 2010 19:31, Greg Snow <Greg.Snow at imail.org> wrote:
> The plots did not come through, see the posting guide for which attachments are allowed.  It will be easier for us to help if you can send reproducible code (we can copy and paste to run, then examine, edit, etc.).  Try finding a subset of your data for which the problem still occurs, then send the data if possible, or similar simulated data if you cannot send original data.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Samuel Wuest
>> Sent: Wednesday, August 25, 2010 8:20 AM
>> To: r-help at r-project.org
>> Subject: [R] approxfun-problems (yleft and yright ignored)
>>
>> Dear all,
>>
>> I have run into a problem when running some code implemented in the
>> Bioconductor panp-package (applied to my own expression data), whereby
>> gene
>> expression values of known true negative probesets (x) are interpolated
>> onto
>> present/absent p-values (y) between 0 and 1 using the *approxfun -
>> function*{stats}; when I have used R version 2.8, everything had
>> worked fine,
>> however, after updating to R 2.11.1., I got unexpected output
>> (explained
>> below).
>>
>> Please correct me here, but as far as I understand, the yleft and
>> yright
>> arguments set the extreme values of the interpolated y-values in case
>> the
>> input x-values (on whose approxfun is applied) fall outside range(x).
>> So if
>> I run approxfun with yleft=1 and yright=0 with y-values between 0 and
>> 1,
>> then I should never get any values higher than 1. However, this is not
>> the
>> case, as this code-example illustrates:
>>
>> > ### define the x-values used to construct the approxfun, basically
>> these
>> are 2000 expression values ranging from ~ 3 to 7:
>> > xNeg <- NegExprs[, 1]
>> > xNeg <- sort(xNeg, decreasing = TRUE)
>> >
>> > ### generate 2000 y-values between 0 and 1:
>> > yNeg <- seq(0, 1, 1/(length(xNeg) - 1))
>> > ### define yleft and yright as well as the rule to clarify what
>> should
>> happen if input x-values lie outside range(x):
>> > interp <- approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule=2)
>> Warning message:
>> In approxfun(xNeg, yNeg, yleft = 1, yright = 0, rule = 2) :
>>   collapsing to unique 'x' values
>> > ### apply the approxfun to expression data that range from ~2.9 to
>> 13.9
>> and can therefore lie outside range(xNeg):
>> >  PV <- sapply(AllExprs[, 1], interp)
>> > range(PV)
>> [1]    0.000 6208.932
>> > summary(PV)
>>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>> 0.000e+00 0.000e+00 2.774e-03 1.299e+00 3.164e-01 6.209e+03
>>
>> So the resulting output PV object contains data ranging from 0 to 6208,
>> the
>> latter of which lies outside yleft and is not anywhere close to extreme
>> y-values that were used to set up the interp-function. This seems wrong
>> to
>> me, and from what I understand, yleft and yright are simply ignored?
>>
>> I have attached a few histograms that visualize the data distributions
>> of
>> the objects I xNeg, yNeg, AllExprs[,1] (== input x-values) and PV (the
>> output), so that it is easier to make sense of the data structures...
>>
>> Does anyone have an explanation for this or can tell me how to fix the
>> problem?
>>
>> Thanks a million for any help, best, Sam
>>
>> > sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_IE.UTF-8/en_IE.UTF-8/C/C/en_IE.UTF-8/en_IE.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] panp_1.18.0   affy_1.26.1   Biobase_2.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affyio_1.16.0         preprocessCore_1.10.0
>>
>>
>> --
>> -----------------------------------------------------
>> Samuel Wuest
>> Smurfit Institute of Genetics
>> Trinity College Dublin
>> Dublin 2, Ireland
>> Phone: +353-1-896 2444
>> Web: http://www.tcd.ie/Genetics/wellmer-2/index.html
>> Email: wuests at tcd.ie
>> ------------------------------------------------------
>
>



-- 
-----------------------------------------------------
Samuel Wuest
Smurfit Institute of Genetics
Trinity College Dublin
Dublin 2, Ireland
Phone: +353-1-896 2444
Web: http://www.tcd.ie/Genetics/wellmer-2/index.html
Email: wuests at tcd.ie
------------------------------------------------------


More information about the R-help mailing list