[R] understanding output of tapply/by cumsum

jim holtman jholtman at gmail.com
Tue Dec 7 13:45:42 CET 2010


You can also use 'split' to separate each group:

> split(d, list(d$a, d$c))
$`1.1`
   a b c    n  cum
1  1 1 1 11.1 11.1
6  1 2 1 12.1 23.2
11 1 3 1 13.1 36.3

$`2.1`
   a b c    n  cum
2  2 1 1 21.1 21.1
7  2 2 1 22.1 43.2
12 2 3 1 23.1 66.3

$`3.1`
   a b c    n  cum
3  3 1 1 31.1 31.1
8  3 2 1 32.1 63.2
13 3 3 1 33.1 96.3

$`4.1`
   a b c    n   cum
4  4 1 1 41.1  41.1
9  4 2 1 42.1  83.2
14 4 3 1 43.1 126.3

$`5.1`
   a b c    n   cum
5  5 1 1 51.1  51.1
10 5 2 1 52.1 103.2
15 5 3 1 53.1 156.3

$`1.2`
   a b c    n  cum
16 1 1 2 11.2 11.2
21 1 2 2 12.2 23.4
26 1 3 2 13.2 36.6

$`2.2`
   a b c    n  cum
17 2 1 2 21.2 21.2
22 2 2 2 22.2 43.4
27 2 3 2 23.2 66.6

$`3.2`
   a b c    n  cum
18 3 1 2 31.2 31.2
23 3 2 2 32.2 63.4
28 3 3 2 33.2 96.6

$`4.2`
   a b c    n   cum
19 4 1 2 41.2  41.2
24 4 2 2 42.2  83.4
29 4 3 2 43.2 126.6

$`5.2`
   a b c    n   cum
20 5 1 2 51.2  51.2
25 5 2 2 52.2 103.4
30 5 3 2 53.2 156.6

>


On Tue, Dec 7, 2010 at 6:39 AM, Gerrit Draisma <gdraisma at xs4all.nl> wrote:
> Dear R-users,
>
> I have a dataset with categories and numbers.
> I would like to compute and add cumulative numbers
> to the dataset.
> I do not understand the structure of by(...) or
> tapply(...) output enough to handle it.
>
> Here a small example
> --------------
> d<-expand.grid(a=1:5,b=1:3,c=1:2)
> d$n = 10 * d$a + d$b +0.1* d$c
> Sn<-by(d$n,list(d$a,d$c),cumsum)
> str(Sn)
> ---------
> List of 10
>  $ : num [1:3] 11.1 23.2 36.3
>  $ : num [1:3] 21.1 43.2 66.3
>  $ : num [1:3] 31.1 63.2 96.3
>  $ : num [1:3]  41.1  83.2 126.3
>  $ : num [1:3]  51.1 103.2 156.3
>  $ : num [1:3] 11.2 23.4 36.6
>  $ : num [1:3] 21.2 43.4 66.6
>  $ : num [1:3] 31.2 63.4 96.6
>  $ : num [1:3]  41.2  83.4 126.6
>  $ : num [1:3]  51.2 103.4 156.6
>  - attr(*, "dim")= int [1:2] 5 2
>  - attr(*, "dimnames")=List of 2
>  ..$ : chr [1:5] "1" "2" "3" "4" ...
>  ..$ : chr [1:2] "1" "2"
>  - attr(*, "call")= language by.default(data = d$n, INDICES = list(d$a,
> d$c), FUN = cumsum)
>  - attr(*, "class")= chr "by
> ---------
> # these give (a) lists of one numerical vector(a)
> Sn[5,2]
> Sn[cbind(d$a,d$c)]
> # how to access the individual cumsum values?
> # and assign them to d$Sn?
> --------------
>
> Thanks,
> Gerrit.
>
> ---
> Gerrit Draisma
> Department of Public Health
> Erasmus MC, University Medical Center Rotterdam
> Room AE-235
> P.O. Box 2040 3000 CA  Rotterdam The Netherlands
> Phone: +31 10 7043787 Fax: +31 10 7038474
> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list