[Rd] Any interest in "merge" and "by" implementations specifically for sorted data?
Kevin B. Hendricks
kevin.hendricks at sympatico.ca
Sun Jul 30 16:11:21 CEST 2006
After playing with this some more and adding an implementation to
handle NAs in the data vector, I have run into the problem of what to
return when the only data values for a particular bin (or level) in
the data vector were NAs and the user selected na.rm=T
1. Should it return 0 for counts of that particular bin and NA for
that bin for all of the other functions? If so, wouldn't that be
strange to return a NA just since there is no valid data for that bin
because the user asked for na.rm=T?
2. Or do I have to literally rebuild the final result vector,
removing all "unused" bins before returning the results? And
wouldn't that cause problems in not all of the levels from 1:ngroups
will be returned for some variables and not for others.
I personally like the approach of 1. better since if I give an igroup
function my groups and tell it to na.rm=T from my data vector, I
would really want all group levels returned and not just the ones
that had valid data in them and if a particular group had no data, I
would want the count to be 0 for that bin and all of the other funs
to return NA for that particular bin?
Is that what you are returning in that case?
Also, do you always return Sums, Maxs, and Mins as "numeric" or do
you sometimes return "integer" values if an "integer" data vector is
Are "Counts" always returned as "integer" or do you always set them
to "numeric" or does that vary with the type of the data vector
Do you handle "complex" data vectors in a similar fashion (ie. using
the length of the complex vector as its value for Maxs, Mins, etc?)?
More information about the R-devel