[R] mismatch between match and unique causing ecdf (well, approxfun) to fail
Meyners, Michael
meyners.m at pg.com
Mon Jun 8 12:51:26 CEST 2015
Aehm, adding on this: I incorrectly *assumed* without testing that rounding would help; it doesn't:
ecdf(round(test2,0)) # a rounding that is way too rough for my application...
#Error in xy.coords(x, y) : 'x' and 'y' lengths differ
Digging deeper: The initially mentioned call to unique() is not very helpful, as test2 is a data frame, so I get what I deserve, an unchanged data frame with 1 row. Still, the issue remains and can even be simplified further:
> ecdf(data.frame(a=3, b=4))
Empirical CDF
Call: ecdf(data.frame(a = 3, b = 4))
x[1:2] = 3, 4
works ok, but
> ecdf(data.frame(a=3, b=3))
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead,
> ecdf(c(a=3, b=3))
Empirical CDF
Call: ecdf(c(a = 3, b = 3))
x[1:1] = 3
does the trick. From ?ecdf, I get that x should be a numeric vector - apparently, my misuse of the function by applying it to a row of a data frame (i.e. a data frame with one row). In all my other (dozens of) cases that worked ok, though but not for this particular one. A simple unlist() helps:
> ecdf(unlist(data.frame(a=3, b=3)))
Empirical CDF
Call: ecdf(unlist(data.frame(a = 3, b = 3)))
x[1:1] = 3
Yet, I'm even more confused than before: in my other data, there were also duplicated values in the vector (1-row-data frame), and it never caused any issue. For this particular example, it does. I must be missing something fundamental...
Michael
> -----Original Message-----
> From: Meyners, Michael
> Sent: Montag, 8. Juni 2015 12:02
> To: 'r-help at r-project.org'
> Subject: mismatch between match and unique causing ecdf (well,
> approxfun) to fail
>
> All,
>
> I encountered the following issue with ecdf which was originally on a vector
> of length 10,000, but I have been able to reduce it to a minimal reproducible
> example (just to avoid questions why I'd want to do this for a vector of
> length 2...):
>
> test2 = structure(list(X817 = 3.39824670255344, X4789 = 3.39824670255344),
> .Names = c("X817", "X4789"), row.names = 74L, class = "data.frame")
> ecdf(test2)
>
> # Error in xy.coords(x, y) : 'x' and 'y' lengths differ
>
> In an attempt to track this down, it occurs that
>
> unique(test2)
> # X817 X4789
> #74 3.398247 3.398247
>
> while
>
> match(test2, unique(test2))
> #[1] 1 1
>
> matches both values to the first one. This causes a hiccup in the call to ecdf,
> as this uses (an equivalent to) a call to approxfun with x = test2 and y =
> cumsum(tabulate(match(test2, unique(test2)))), the latter now containing
> one entry less than the former, so xy.coords fails.
>
> I understand that the issue should be somehow related to FAQ 7.31, but I
> would have hoped that unique and match would be using the same precision
> and hence both or neither would consider the two values identical, but not
> one match while unique doesn't.
>
> Last but not least, it doesn't really cause an issue on my end (other than
> breaking my code and hence out of a loop at first place...); rounding will help
> w/o noteworthy changes to the outcome, so no need to propose a
> workaround :-) I'd rather like to raise the issue and learn whether there is a
> purpose for this behavior, and/or whether there is a generic fix to this, or
> whether I am completely missing something.
>
> Version info (under Windows 7):
> R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> Cheers, Michael
More information about the R-help
mailing list