[R] Splitting Area under curve into equal portions

Daniel Nordlund djnordlund at verizon.net
Thu Mar 26 09:39:56 CET 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Nathan S. 
> Watson-Haigh
> Sent: Wednesday, March 25, 2009 10:59 PM
> To: milton ruser
> Cc: r-help at r-project.org
> Subject: Re: [R] Splitting Area under curve into equal portions
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi Milton,
> 
> Not quite, that would be an equal number of data points in 
> each colour group.
> What I want is an unequal number of points in each group such that:
> sum(work[group.members]) is approximately the same for each 
> group of data points.
> 
> In the mean time, I came up with the following, and took a 
> leaf out of your book
> with the colouring for example:
> 
> <code>
> n <- 2002
> work <- vector()
> for(x in 1:(n-2)) {
>   work[x] <- ((n-1-x)*(n-x))/2
> }
> plot(work)
> 
> tasks <- vector('list')
> tasks_per_slave <- 1
> work_per_task <- sum(work) / (n_slaves * tasks_per_slave)
> 
> # Now define ranges of x of equal "work"
> block_start <- 1
> for(x in (1:(length(work)))) {
>   if(x == length(work)) {
>     # this will be the last block
>     tasks[[length(tasks)+1]] <- list(x=block_start:length(work))
>     break
>   }
>   work_in_block_to_x <- sum(work[block_start:(x)])
> 
>   if(work_in_block_to_x > work_per_task) {
>     # use this value of x as the chunk end
>     tasks[[length(tasks)+1]] <- list(x=block_start:x)
> 
>     # move the block_start position
>     block_start <- x+1
>   }
> }
> 
> colours <- vector()
> for(i in 1:length(tasks)) {
>   colours <- append(colours,rep(i,length(tasks[[i]]$x)))
> }
> 
> plot(work, col=colours)
> </code>
> 
> Essentially, the area under the line for each of the coloured 
> groups (i.e. the
> total work associated with those values of x) should be 
> approximately equal and
> I believe the above code achieves this. Just found the 
> cumsum() function. You
> could look at it this way:
> 
> <code>
> plot(cumsum(work), col=colours)
> </code>
> 
> The coloured groupings coincide with splitting the cumulative 
> total (y-axis)
> into 4 approximately equal bits.
> 
> There must be a nicer way to do this!
> Nathan
> 

Nathan,

Someone will probably come up with a more elegant way, but does this help?
slice() will partition work into n groups where the sum in each group is
approximately the same.  slice() returns the index of the last element of
work[] for each group (except the last group).  The first group can be
indexed by 1:p[1]. The second by (p[1]+1):p[2] ... And the n-th group by
p[n-1]:N, where N <- length(work).

slice <- function(v, n){
  subtot <- floor(sum(v)/n)
  cumtot <- cumsum(v)
  p <- rep(0,n-1)
  for(i in 1:(n-1)) p[i] <- max(which(cumtot < (subtot*i)))
  p
  }

#to break work into ten groups
slice(work,10)


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA




More information about the R-help mailing list