[R] mismatch between match and unique causing ecdf (well, approxfun) to fail

Meyners, Michael meyners.m at pg.com
Mon Jun 8 12:01:42 CEST 2015


All,

I encountered the following issue with ecdf which was originally on a vector of length 10,000, but I have been able to reduce it to a minimal reproducible example (just to avoid questions why I'd want to do this for a vector of length 2...):

test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344), .Names = c("X817", "X4789"), row.names = 74L, class = "data.frame")
ecdf(test2) 

# Error in xy.coords(x, y) : 'x' and 'y' lengths differ

In an attempt to track this down, it occurs that 

unique(test2)
#       X817    X4789
#74 3.398247 3.398247

while 

match(test2, unique(test2))
#[1] 1 1

matches both values to the first one. This causes a hiccup in the call to ecdf, as this uses (an equivalent to) a call to approxfun with x = test2 and y = cumsum(tabulate(match(test2, unique(test2)))), the latter now containing one entry less than the former, so xy.coords fails.

I understand that the issue should be somehow related  to FAQ 7.31, but I would have hoped that unique and match would be using the same precision and hence both or neither would consider the two values identical, but not one match while unique doesn't. 

Last but not least, it doesn't really cause an issue on my end (other than breaking my code and hence out of a loop at first place...); rounding will help w/o noteworthy changes to the outcome, so no need to propose a workaround :-) I'd rather like to raise the issue and learn whether there is a purpose for this behavior, and/or whether there is a generic fix to this, or whether I am completely missing something.

Version info (under Windows 7): 
R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Platform: x86_64-w64-mingw32/x64 (64-bit)

Cheers, Michael 



More information about the R-help mailing list