[R] what does cut(data, breaks=n) actually do?
Domenico Vistocco
vistocco at unicas.it
Thu Dec 13 10:17:20 CET 2007
cut(data, breaks=n)
splits the data in n bins of (approximately) the same size.
The used size is obtained by:
max(data) - min(data)
------------------------------------
n
> x=rnorm(x)
> cut(x,breaks=3)
[1] (1.79,9.97] (-6.39,1.79] (9.97,18.2] (9.97,18.2] (-6.39,1.79]
[6] (1.79,9.97] (-6.39,1.79] (1.79,9.97] (-6.39,1.79] (-6.39,1.79]
Levels: (-6.39,1.79] (1.79,9.97] (9.97,18.2]
Then you have:
> 18.2-9.97
[1] 8.23
> 9.97-1.79
[1] 8.18
> 1.79+6.39
[1] 8.18
>
> (max(x)-min(x))/3
[1] 8.164187
I don't know the reasons for the little differences (I am wondering about).
I hope it is useful.
domenico
melissa cline wrote:
> Hello,
>
> I'm trying to bin a quantity into 2-3 bins for calculating entropy and
> mutual information. One of the approaches I'm exploring is the cut()
> function, which is what the mutualInfo function in binDist uses. When it's
> called in the format cut(data, breaks=n), it somehow splits the data into n
> distinct bins. Can anyone tell me how cut() decides where to cut?
>
> Thanks,
>
> Melissa
>
>
>
> ---------------------------------------------------------------
> Melissa Cline, Independent Investigator
> MCD Biology, UCSC
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list