[R] subtotal, submean, aggregate

Gabor Grothendieck ggrothendieck at gmail.com
Sun Feb 26 16:04:49 CET 2006


Yes, that must be it.  Probably best to issue a:

set.seed(1)

as part of the code when posting examples with random numbers.

Also here is a variation that uses rle that Roger used together with
some elements of the solution I posted:

runno <- with(rle(as.numeric(transect[,2])), rep(seq(along = lengths), lengths))
aggregate(transect[,1], list(obs = transect[,2], runno), sum)[,-2]


On 2/26/06, Patrick Giraudoux <patrick.giraudoux at univ-fcomte.fr> wrote:
> Yes right. Checking some examples, all come out OK.
>
>
> same as your example but I think there are some errors in your example
> output.
> Simply the 'errors' observed come simply from the seed in
> rpois(length(habitats),2)
> It is unlikely it is the same on your and my computer...
>
> Cheers,
>
> Patrick
>
>
> Gabor Grothendieck a écrit :
> We are just comparing the difference to 0 so it does not matter if its
> positive
or negative. All that matters is whether its 0 or not.

In fact,
> the runno you calculate with the abs is identical to the one
I posted
> without the abs:

runno <- cumsum(c(TRUE,
> abs(diff(as.numeric(transect[,2])))!=0))
runno2 <- cumsum(c(TRUE,
> diff(as.numeric(transect[,2])))!=0)
identical(runno, runno2) # TRUE


On
> 2/26/06, Patrick Giraudoux
> <patrick.giraudoux at univ-fcomte.fr> wrote:

> Excellent! I was messing with this problem since the early
> afternoon.
Actually the discrepancy you noticed remaining comes from
> negative
difference in
diff(as.numeric(transect[,2]))
One can work it around
> using abs(diff(as.numeric(transect[,2]))). This
makes:

runno <-
> cumsum(c(TRUE,
> abs(diff(as.numeric(transect[,2])))!=0))
aggregate(transect[,1], list(obs =
> transect[,2], runno = runno), sum)

I did not know about this use of diff,
> which was the key point... and then
cumsum for polishing. Really great and
> also elegant (concise). I like it!

Thanks a
> lot!!!

Cheers,

Patrick


Gabor Grothendieck a écrit :
Create another
> variable that gives the run number and aggregate on

> both the

> habitat and run number removing the run number after

> aggregating:

runno <-

> cumsum(c(TRUE, diff(as.numeric(transect[,2])) !=0))

> aggregate(transect[,1],

> list(obs = transect[,2], runno = runno), sum)[,-2]

> This does not give the

> same as your example but I think there are some

> errors in your example

> output.

> On 2/26/06, Patrick Giraudoux

> <patrick.giraudoux at univ-fcomte.fr> wrote:

>
> Dear All,

> I would like to make partial sums (or means or any other

> function) of

> the values in intervals along a sequence (spatial transect)

> where groups

> are defined.

For

> instance:

> habitats<-rep(c("meadow","forest","meadow","pasture"),c(10,5,12,6))
observations<-rpois(length(habitats),2)
transect<-data.frame(observations=observations,habitats=habitats)

aggregate()
> is not suitable for my purpose because I want a result

> respecting the order

> of the habitats encountered although they may have

> the same name (and not

> pooling each group on each level of the factor

> created). For instance, the

> output of the ideal function

> mynicefunction() would be something

> as:

> mynicefunction(transect$observations,

> by=list(transect$habitats),sum)

> meadow 16
forest 9
meadow 21
pasture 17

and

> not

> aggregate(transect$observations,by=list(transect$habitats),sum)
> Group.1 x

> 1 forest 9
2 meadow 37
3 pasture 17

Did anybody hear about such a

> function already written in R? If no, any

> idea to make it simple and elegant

> to write?

> Cheers,

Patrick

> Giraudoux

> ______________________________________________
R-help at stat.math.ethz.ch
> mailing
list

> https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do

> read the posting
> guide!
http://www.R-project.org/posting-guide.html

>

>
>
>




More information about the R-help mailing list