[R] Splitting Area under curve into equal portions
Nathan S. Watson-Haigh
nathan.watson-haigh at csiro.au
Thu Mar 26 06:59:03 CET 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Milton,
Not quite, that would be an equal number of data points in each colour group.
What I want is an unequal number of points in each group such that:
sum(work[group.members]) is approximately the same for each group of data points.
In the mean time, I came up with the following, and took a leaf out of your book
with the colouring for example:
<code>
n <- 2002
work <- vector()
for(x in 1:(n-2)) {
work[x] <- ((n-1-x)*(n-x))/2
}
plot(work)
tasks <- vector('list')
tasks_per_slave <- 1
work_per_task <- sum(work) / (n_slaves * tasks_per_slave)
# Now define ranges of x of equal "work"
block_start <- 1
for(x in (1:(length(work)))) {
if(x == length(work)) {
# this will be the last block
tasks[[length(tasks)+1]] <- list(x=block_start:length(work))
break
}
work_in_block_to_x <- sum(work[block_start:(x)])
if(work_in_block_to_x > work_per_task) {
# use this value of x as the chunk end
tasks[[length(tasks)+1]] <- list(x=block_start:x)
# move the block_start position
block_start <- x+1
}
}
colours <- vector()
for(i in 1:length(tasks)) {
colours <- append(colours,rep(i,length(tasks[[i]]$x)))
}
plot(work, col=colours)
</code>
Essentially, the area under the line for each of the coloured groups (i.e. the
total work associated with those values of x) should be approximately equal and
I believe the above code achieves this. Just found the cumsum() function. You
could look at it this way:
<code>
plot(cumsum(work), col=colours)
</code>
The coloured groupings coincide with splitting the cumulative total (y-axis)
into 4 approximately equal bits.
There must be a nicer way to do this!
Nathan
milton ruser wrote:
> Hi Nathan,
>
> I am not sure that I understood what you need, and
> also I know that it is not a elegant solution, but may
> do the job.
>
> n <- 1991
> work <- vector()
> for(x in 1:n) {
> work[x] <- sum(1:(n-x+1))
> }
> plot(work)
>
> number.groups <- 5
> last.i<-0
> number.groups.list<-NULL
> for (i in 1:(number.groups-1))
> {
> number.groups.list<-c(number.groups.list, rep(i,
> round(length(work)/number.groups,0)))
> }
> number.groups.list<-c(number.groups.list, rep(number.groups,
> (length(work)-length(number.groups.list)) ))
> aggregate(work, list(number.groups.list), sum)
> plot(work, col=number.groups.list)
>
> Regards a lot,
>
> miltinho
> brazil
>
> On Wed, Mar 25, 2009 at 9:48 PM, Nathan S. Watson-Haigh
> <nathan.watson-haigh at csiro.au> wrote:
>
> I have some data generated as follows:
>
> <code>
> n <- 2000
> work <- vector()
> for(x in 1:n) {
> work[x] <- sum(1:(n-x+1))
> }
> plot(work)
> </code>
>
> What I want to do
> -----------------
> I want to split work into a number of unequal chunks such that the
> sum of the
> values in each chunk is approximately equal.
>
>
>
> The numbers in "work" are proportional to the amount of work to be
> performed for
> each value of x by a function I've written. i.e. For each value of
> x, there are
> work[x] * y calculations to be performed (where y is a constant).
>
> I've written a parallel version of my function where I simply assign
> z number of
> x values to each slave. This is not ideal, since a slave that gets
> the 1:z
> smallest values of x will take longer to compute than the (n-z+1):n
> set of x
> values. For example, if I have 4 slaves available:
>
> slave 1 processes x in 1:500
> slave 2 processes x in 501:1000
> slave 3 processes x in 1001:1500
> slave 4 processes x in 1501:2000
>
> This means the total work performed by each slave is:
>
> slave 1 sum(work[1:500]) = 771708500
> slave 2 sum(work[501:1000]) = 396458500
> slave 3 sum(work[1001:1500]) = 146208500
> slave 4 sum(work[1501:2000]) = 20958500
>
> Manually plitting work into chunks where the sum of the values for
> the chunks is
> approximately equal, I get the following:
>
> sum(work[1:184])
> [1] 335533384
>> sum(work[185:415])
> [1] 334897871
>> sum(work[416:745])
> [1] 334672085
>> sum(work[746:2000])
> [1] 330230660
>
> I need to be able to do this automatically for any value of n and I
> think I
> should be able to do this by calculating the area under the curve
> and slicing it
> into equally sized regions, but don't really know how to get there
> from what
> I've said above!
>
> Cheers,
> Nathan
>
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
- --
- --------------------------------------------------------
Dr. Nathan S. Watson-Haigh
OCE Post Doctoral Fellow
CSIRO Livestock Industries
Queensland Bioscience Precinct
St Lucia, QLD 4067
Australia
Tel: +61 (0)7 3214 2922
Fax: +61 (0)7 3214 2900
Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
- --------------------------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknLGacACgkQ9gTv6QYzVL5zsgCfU4sJwZtLVDsky9IgXn5JbvHy
COgAnihLhkuJm5vpgVpfcJGA2lP524in
=CjBV
-----END PGP SIGNATURE-----
More information about the R-help
mailing list