[R-sig-Geo] Cumulative area distribution

Tue Oct 10 21:13:45 CEST 2006

 Roger, thanks for the tip on maptools, waaayyyy easier to get an ESRI grid
into R, than what I was doing. 

I've narrowed down the problem definition: 

I have a flow accumulation grid, in which each cell reports the number of
cells, upstream of that cell, flowing into that cell, with the minimum being
"1" or "itself". The max, in my case, is 183695 cells, flow into one cell
(the watershed mouth). 

In this grid, there are many cells with a value of 1 (I think 33000), many
more with a value of 2 (perhaps 10000), not so many with a value of 100
(maybe 50)...so forth. 

Coverting the number of cells to area is relative simple, as Roger
demonstrated: basin_area <- cell_count * 20 * 20

The Cumulative area distribution plot, is the plot of the probabilty that
the drainage area of one cell is greater than the largest drainage area
found among all the cells. This would be 99% for the cells with fac=1, and
.001 for the cells with fac=183695.

I think the best way to do this is with an ordinal ranking (and the way many
do it by hand), so that the low-flow area cells (of which there are a lot)
get a rank of 100, med-flow area cells get, say 50, and the high-flow area
cells get, 1. Then the probabability could be defined as rank/101 e.g.,
100/101 for the low flow areas, and 1/101 for the high flow areas. The the
plot would show such things as the breakpoints between convergent and
divergent flow (McNamara, 2006). 

 This is very easy to do in Excel, but...with the explosion of ultra-high
resolution DEMs with many cells in a DEM, this reduces Excel to the role of
balancing checkbooks (as it should be). Not to mention the benefit of having
this data as a table or matrix in R.....

I tried the function that Roger posted below (thanks Roger!), it worked
great ( I learned some new R tricks), but the resulting curve.....isn't a
CAD plot. My lack of experience in both R and programming have led me to
this useful mailing list in search of some adivce on how best to get a CAD
plot for flow accumulation girds with many thousands of cells. 

Thanks for any help you can provide!

#################
To: Thomas P. Colson
Subject: Re: [R] Probability of exceedance function question (fwd)

A final cut:

t1 <- ecdf(basin_area)
x <- seq(min(basin_area), max(basin_area), length=500) y <- 1 - t1(x)
plot(x, y, type="l", xaxs="i", yaxs="i", ylim=c(0, 1))

This uses the ecdf() subtracted from 1 to perhaps get closer to what you
need.

Roger

By the way, you can read an Arc ASCII grid into a SpatialGridDataFrame 
with function readAsciiGrid() in package maptools, so for you:

basin.map <- readAsciiGrid("c:/temp/area.asc", colname="basinID")
basins <- split(basin.map$basinID, basin.map$basinID)
length(basins) # sanity check
cell_count <- sapply(basins, length)
basin_area <- cell_count * 20 * 20
t1 <- ecdf(basin_area)
plot(t1, pch=".", axes=FALSE) # and set xlab=, ylab=, main=
axis(1)
box()
grid()
axis(2, at=axTicks(2), labels=rev(axTicks(2)))