[R] how to calculate another vector based on the data from a combination of two factors
Marc Schwartz
marc_schwartz at comcast.net
Mon Nov 17 04:39:21 CET 2008
on 11/16/2008 08:12 PM jeffc wrote:
> Hi,
>
> I have a data set similar to the following
>
> State Gender Quantity
> TX Male 1
> NY Female 2
> TX Male 3
> NY Female 4
>
>
> I need to calculate cumulative sum of the quantity by State and Gender. The
> expected output is
> State Gender Quantity CumQuantity
> TX Male 1 1
> TX Male 3 4
> NY Female 2 2
> NY Female 4 6
>
> I highly appreciate if someone can give me some hints on solving that in R.
>
> Hao
I would verify this, but something along the lines of the following:
> DF
State Gender Quantity
1 TX Male 1
2 NY Female 2
3 TX Male 3
4 NY Female 4
do.call(rbind,
lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
function(x) cbind(x, cumsum(x$Quantity))))
which yields:
State Gender Quantity cumsum(x$Quantity)
NY.Female.2 NY Female 2 2
NY.Female.4 NY Female 4 6
TX.Male.1 TX Male 1 1
TX.Male.3 TX Male 3 4
To take this step by step:
First, split() DF by the two factors:
> split(DF, list(DF$State, DF$Gender), drop = TRUE)
$NY.Female
State Gender Quantity
2 NY Female 2
4 NY Female 4
$TX.Male
State Gender Quantity
1 TX Male 1
3 TX Male 3
Pass that to lapply(), in which we do the cumsum() and cbind():
> lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
function(x) cbind(x, cumsum(x$Quantity)))
$NY.Female
State Gender Quantity cumsum(x$Quantity)
2 NY Female 2 2
4 NY Female 4 6
$TX.Male
State Gender Quantity cumsum(x$Quantity)
1 TX Male 1 1
3 TX Male 3 4
Pass that to do.call() to rbind() the results together:
> do.call(rbind,
lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
function(x) cbind(x, cumsum(x$Quantity))))
State Gender Quantity cumsum(x$Quantity)
NY.Female.2 NY Female 2 2
NY.Female.4 NY Female 4 6
TX.Male.1 TX Male 1 1
TX.Male.3 TX Male 3 4
See ?split, ?do.call, ?rbind and ?cumsum.
If you want the exact row ordering as you had it in your post, you can
alter the factor levels, otherwise they will be sorted by alpha (eg. NY
before TX and Female before Male).
HTH,
Marc Schwartz
More information about the R-help
mailing list