[R] Probability of exceedance function question

Thomas P. Colson tpcolson at ncsu.edu
Sun Oct 8 23:21:44 CEST 2006


I am able to convert the flow accumulation grid into an area (for each
pixel) grid, then import this into R as an ASCII file. The plot(ecdf)
function in R seems to plot the opposite: curve starts at probability 0, for
drainage area 0, should be the other way? 

About 150,000 data points in these sets, ecdf curve plots in about 15
seconds. 


Could the problem be, how I'm importing the data from ascii grid? Cellsize
is 20 ft and z is the drainage area, for each cell (flow weighted)

 area <- read.table(file = "c:/temp/area.asc", sep = " ", na.strings =
"-9999", skip = 6) 
area <- area[,-ncol(area)] 
xLLcorner <- 1985649.0700408898
yLLcorner <- 841301.04004059616
cellsize <-20
xURcorner <- xLLcorner + (cellsize * (ncol(area) - 1)) 
xLRcorner <- xURcorner 
xULcorner <- xLLcorner
yULcorner <- yLLcorner + (cellsize * (nrow(area) - 1)) 
yURcorner <- yULcorner 
yLRcorner <- yLLcorner 
coordsa <- expand.grid(y = seq(yULcorner, yLRcorner, by = -20),x =
seq(xULcorner, xLRcorner, by = 20))
area<- data.frame(coordsa, tmin = as.vector(c(area,recursive = T))) 
names(area)<-c("x","y","z")
Plot(ecdf(area$z))


-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand at nhh.no] 
Sent: Sunday, October 08, 2006 4:37 PM
To: Thomas P. Colson
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Probability of exceedance function question

On Sun, 8 Oct 2006, Thomas P. Colson wrote:

> I'm trying to calculate a cumulative area distribution (graph) of 
> drainage areas. This is defined as P(A > A*). Simple in principle. I 
> can do this in excel, with "COUNTIF", which will count the number of 
> cells in the row "area" that have area A, then determine, for each 
> cell in the row "area, how many cells exceede that area, then dividing 
> that number by the total number of cells, which gives me the 
> probability that drainage area A exceeds drainage area A*.

Is this ecdf() of the vector or its suitable subset? If so, it runs very
fast even for large data sets. For plotting, bear in mind that you are
generating a lot of output, though:

> t0 <- runif(100000)
> system.time(t1 <- ecdf(t0))
[1] 0.222 0.022 0.248 0.000 0.000
> system.time(plot(t1, pch="."))
[1] 1.089 0.079 1.186 0.000 0.000

isn't at all bad!

> 
> E.g, drainage area of 6 sq meters (One DEM grid cell) has a high 
> probability of exceedance(.99), while a drainage area of 100,000 
> square meters has a low probability of exceedance (.001).
> 
> I wish to plot this relationship, and we all know that excel is not 
> the tool of choice when working with hundreds of thousands of records. 
> I'd like to port the CAD into a few R functions that I've already 
> developed for other tests as well.
> 
> So my challenge, in R, is to
> (1)count the number of rows in column "Area" that have AREA(*),
> 
> (2) determine, by row, how many rows have an area greater than the 
> area given in that one row
> 
> (3) divide step 2 by number of rows (how can I do a row count and port 
> that to a variable, as I have to do this on 10 datasets?)
> 
> Thanks for any advice you can offer to this endevour
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-help mailing list