[R] a question about box counting
Rajarshi Guha
rxg218 at psu.edu
Mon Apr 4 23:46:23 CEST 2005
On Mon, 2005-04-04 at 14:22 -0400, Rajarshi Guha wrote:
> Hi,
> I have a set of x,y data points and each data point lies between (0,0)
> and (1,1). Of this set I have selected all those that lie in the lower
> triangle (of the plot of these points).
>
> What I would like to do is to divide the region (0,0) to (1,1) into
> cells of say, side = 0.01 and then count the number of cells that
> contain a point.
Thanks very much to Deepayan Sarkar, James Holtman and Ray Brownrigg for
very efficient (and elegant) solutions. I've summarized them below:
Deepayan Sarkar
A combination of cut and table/xtabs should do it, e.g.:
x <- runif(3000)
y <- runif(3000)
fx <- cut(x, breaks = seq(0, 1, length = 101))
fy <- cut(y, breaks = seq(0, 1, length = 101))
txy <- xtabs(~ fx + fy)
image(txy > 0)
sum(txy > 0)
---------------------------------------------------------
james Holtman
Here is a start. This creates a dataframe and then divides the data up
into 10 segments (you wanted 100, so extend it) and then counts the
number
in each cell.
> df <- data.frame(x=runif(100), y=runif(100)) # create data
> breaks <- seq(0,1,.1) # define breaks; you would use 0.01
> table(cut(df$x, breaks=breaks,labels=F),cut(df
$y,breaks=breaks,labels=F))
# use 'cut' to partition and then 'table' to count
1 2 3 4 5 6 7 8 9 10
1 0 2 0 1 0 3 0 1 0 0
2 0 1 0 0 0 2 1 2 0 0
3 0 1 0 0 3 0 2 2 1 2
4 0 0 1 2 3 3 1 2 2 0
5 3 1 2 2 1 2 1 1 1 0
6 2 0 2 0 0 0 0 1 0 0
7 0 1 1 1 2 1 1 1 2 1
8 0 3 2 1 1 2 2 2 1 1
9 0 0 2 2 0 1 2 0 2 2
10 0 2 1 0 0 0 0 0 0 3
-----------------------------------------------------------------
Ray Brownrigg
Another significantly faster way (but not generating row/column names)
is:
x <- runif(3000)
y <- runif(3000)
ints <- 100
myfun <- function(x, y, ints) {
fx <- x %/% (1/ints)
fy <- y %/% (1/ints)
txy <- hist(fx + ints*fy+ 1, breaks=0:(ints*ints), plot=FALSE)$counts
dim(fxy) <- c(ints, ints)
return(txy)
}
myfun(x, y, ints)
-------------------------------------------------------------------
Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
Q: Why did the mathematician name his dog "Cauchy"?
A: Because he left a residue at every pole.
More information about the R-help
mailing list