[R] strange classification behaviour
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Nov 11 07:30:01 CET 2005
You could use cut. The key calculation would be:
w <- .05; eps <- 1e-5
breakpoints <- seq(min(kk), max(kk), .05)
breakpoints <- floor( (breakpoints + (w/2) + eps) / w) * w
values <- cut(kk, c(breakpoints, Inf), right = FALSE)
values <- ordered(values)
If you don't like the labels produced add lab = breakpoints as a cut arg.
On 11/10/05, RenE J.V. Bertin <rjvbertin at gmail.com> wrote:
> Hello,
>
> I've written a routine that takes an input vector and returns a 'binned' version with a requested bin width and converted to an ordered factor by default. It also attempts to make sure that all factor levels intermediate to the input range are present.
>
> This is the code as I currently have it:
>
> Classify <- function( values, ClassWidth=0.05, ordered.factor=TRUE, all=TRUE )
> {
> valuesName <- deparse(substitute(values))
> if( is.numeric(values) ){
> values <- floor( (values+ (ClassWidth/2) ) / ClassWidth ) * ClassWidth
> # determine the numerical range of the input
> levels <- range( values, finite=TRUE )
> if( ordered.factor ){
> if( all ){
> # if we want all levels, construct a levels vector that can be passed to factor's levels argument:
> levels <- seq( levels[1], levels[2], by=ClassWidth )
> values <- factor(values, levels=levels, ordered=TRUE )
> }
> else{
> values <- factor(values, ordered=TRUE )
> }
> }
> }
> else{
> levels <- range( values, finite=TRUE )
> if( all ){
> levels <- seq( levels[1], levels[2], by=ClassWidth )
> values <- factor( values, levels=levels, ordered=ordered.factor )
> }
> else{
> values <- factor( values, ordered=ordered.factor )
> }
> }
> comment(values) <- paste( comment(values),
> "; Classify(", valuesName, ", ClassWidth=", ClassWidth, ", ordered.factor=", ordered.factor, ")",
> sep="")
> values
> }
>
> This does work, but has some strange side-effects that I think might be due to rounding errors:
>
> ##> kk<-c( 0.854189 0.374423 0.522893 0.670796 0.913540 0.979011 0.510378 0.320440 -0.576764 0.940343 )
>
> ##> Classify( kk, ClassWidth=0.05, all=FALSE )
> [1] 0.85 0.35 0.5 0.65 0.9 1 0.5 0.3 -0.6 0.95
> Levels: -0.6 < 0.3 < 0.35 < 0.5 < 0.65 < 0.85 < 0.9 < 0.95 < 1
> ### result as expected, but using this on the hor. axis of a graph can be ... surprising.
>
> ##> Classify( kk, ClassWidth=0.05, all=TRUE )
> [1] 0.85 <NA> 0.5 <NA> <NA> 1 0.5 <NA> -0.6 <NA>
> 33 Levels: -0.6 < -0.55 < -0.5 < -0.45 < -0.4 < -0.35 < -0.3 < -0.25 < -0.2 < -0.15 < -0.1 < -0.05 < 0 < ... < 1
> ##> summary( Classify( kk, ClassWidth=0.05, all=TRUE ) )
> -0.6 -0.55 -0.5 -0.45 -0.4 -0.35
> 1 0 0 0 0 0
> -0.3 -0.25 -0.2 -0.15 -0.1 -0.05
> 0 0 0 0 0 0
> 0 0.0499999999999999 0.1 0.15 0.2 0.25
> 0 0 0 0 0 0
> 0.3 0.35 0.4 0.45 0.5 0.55
> 0 0 0 0 2 0
> 0.6 0.65 0.7 0.75 0.8 0.85
> 0 0 0 0 0 1
> 0.9 0.95 1 NA's
> 0 0 1 5
>
> ### ???
>
> What happens is probably that the value in my input that classify to 0.3 or 0.35 are not found in the list of levels that I calculate due to rounding errors. Adding an element 0.05 to kk supports this idea.
>
> Is there a way around this, for instance a more robust way to do what I'm trying to do here (or a function provided by R)?
>
> When I modify the relevant code above to
>
> levels <- floor( (seq( levels[1], levels[2], by=ClassWidth ) + (ClassWidth/2)) / ClassWidth ) * ClassWidth
> values <- factor( values, levels=levels, ordered=TRUE )
>
> the result is as expected, but I find that not very elegant...
>
> Thanks in advance,
> RenE Bertin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list