[R] How to break data in quantiles properly?
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Apr 27 18:12:27 CEST 2005
Eric Rodriguez wrote:
> Hi,
>
> I would like to break a dataset in n.classes quantiles.
> Till now, I used the following code:
> Classify.Quantile <- function (dataset, nclasses = 10)
> {
> n.probs <- seq(0,1,length=nclasses+1)
> n.labels = paste("C", 1:nclasses-1, sep="")
> n.rows <- nrow(dataset)
> n.cols <- ncol(dataset)
> n.motif <- dataset
>
> for (j in 2:n.cols)
> {
> cat(j, " ");
> discr = n.labels[unclass(cut(dataset[,j],quantile(dataset[,j],n.probs),include.lowest=T))]
> n.motif[,j] = discr
> }
>
> res <- list(motif=n.motif, labels=n.labels, n.classes=nclasses)
> return(res)
> }
>
>
> but if you try to call this with a dataset with a lot of same value, you got a
> Error in cut.default(dataset[, j], quantile(dataset[, j], n.probs),
> include.lowest = T) :
> cut: breaks are not unique
>
> I perfectly understand why but I would like to know how to avoid this behaviour.
>
> for e.g., use this code to raise the error:
> x=matrix(0,1000,1)
> x[100]=1
> Classify.Quantile(x, 10)
>
> of course this dataset is a bit extreme but it happens to get data
> with very small variance.
>
>
> Thanks for any help you could provide
The cut2 function in the Hmisc package may help. -FH
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list