[R] what does cut(data, breaks=n) actually do?
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Dec 13 09:32:37 CET 2007
melissa cline wrote:
> Hello,
>
> I'm trying to bin a quantity into 2-3 bins for calculating entropy and
> mutual information. One of the approaches I'm exploring is the cut()
> function, which is what the mutualInfo function in binDist uses. When it's
> called in the format cut(data, breaks=n), it somehow splits the data into n
> distinct bins. Can anyone tell me how cut() decides where to cut?
>
>
This is one case where reading the actual R code is easier that
explaining what it does. From cut.default
if (length(breaks) == 1) {
if (is.na(breaks) | breaks < 2)
stop("invalid number of intervals")
nb <- as.integer(breaks + 1)
dx <- diff(rx <- range(x, na.rm = TRUE))
if (dx == 0)
dx <- rx[1]
breaks <- seq.int(rx[1] - dx/1000, rx[2] + dx/1000, length.out = nb)
}
so basically it takes the range, extends it a bit and splits in into
<breaks> equally long segments.
(For the sometimes more attractive option of splitting into groups of
roughly equal size, there is cut2 in the Hmisc package, or use quantile())
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list