[R] mismatch between match and unique causing ecdf (well, approxfun) to fail

Martin Maechler maechler at stat.math.ethz.ch
Mon Jun 8 16:42:43 CEST 2015


> Aehm, adding on this: I incorrectly *assumed* without testing that rounding would help; it doesn't:
> ecdf(round(test2,0)) 	# a rounding that is way too rough for my application...
> #Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> 
> Digging deeper: The initially mentioned call to unique() is not very helpful, as test2 is a data frame, so I get what I deserve, an unchanged data frame with 1 row. Still, the issue remains and can even be simplified further:
> 
> > ecdf(data.frame(a=3, b=4))
> Empirical CDF 
> Call: ecdf(data.frame(a = 3, b = 4))
>  x[1:2] =      3,      4
> 
> works ok, but
> 
> > ecdf(data.frame(a=3, b=3))
> Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> 
> doesn't (same for a=b=1 or 2, so likely the same for any a=b). Instead, 
> 
> > ecdf(c(a=3, b=3))
> Empirical CDF 
> Call: ecdf(c(a = 3, b = 3))
>  x[1:1] =      3
> 
> does the trick. From ?ecdf, I get that x should be a numeric vector - apparently, my misuse of the function by applying it to a row of a data frame (i.e. a data frame with one row). In all my other (dozens of) cases that worked ok, though but not for this particular one. A simple unlist() helps:

You were lucky.   To use a one-row data frame instead of a
numerical vector will typically *not* work unless ... well, you
are lucky.

No, do *not*  pass data frame rows instead of numeric vectors.

> 
> > ecdf(unlist(data.frame(a=3, b=3)))
> Empirical CDF 
> Call: ecdf(unlist(data.frame(a = 3, b = 3)))
>  x[1:1] =      3
> 
> Yet, I'm even more confused than before: in my other data, there were also duplicated values in the vector (1-row-data frame), and it never caused any issue. For this particular example, it does. I must be missing something fundamental...
>  

well.. I'm confused about why you are confused,
but if you are thinking about passing rows of data frames as
numeric vectors, this means you are sure that your data frame
only contains "classical numbers" (no factors, no 'Date's,
no...).

In such a case, transform your data frame to a numerical matrix
*once* preferably using  data.matrix(<d.fr>) instead of just  as.matrix(<d.fr>)
but in this case it should not matter.
Then *check* the result and then work with that matrix from then on.

All other things probably will continue to leave you confused ..
;-)

Martin Maechler, 
ETH Zurich



More information about the R-help mailing list