[R] difference
P Tennant
philipt900 at iinet.net.au
Sun Oct 30 02:57:06 CET 2016
Hi,
As Jeff said, more than one grouping variable can be supplied, and there
is an example at the bottom of the help page for ave(). The same goes
for by(), but the order that you supply the grouping variables becomes
important. Whichever grouping variable is supplied first to by() will
change its levels first in the output sequence. You can see from your
dataset:
d2 <- data.frame(city=rep(1:2, ea=6),
year=c(rep(2001, 3), rep(2002, 3), rep(2001, 3), rep(2002, 3)),
num=c(25,75,150,35,65,120,25,95,150,35,110,120))
d2
# city year num
# 1 1 2001 25
# 2 1 2001 75
# 3 1 2001 150
# 4 1 2002 35
# 5 1 2002 65
# 6 1 2002 120
# 7 2 2001 25
# 8 2 2001 95
# 9 2 2001 150
# 10 2 2002 35
# 11 2 2002 110
# 12 2 2002 120
that `year' changes its levels through the sequence down the table
first, and then `city' changes. You want your new column to align with
this sequence. If you put city first in the list of grouping variables
for by(), rather than `year', you won't get the sequence reflected in
your dataset:
by(d2$num, d2[c('city', 'year')], function(x) x - x[1])
# city: 1
# year: 2001
# [1] 0 50 125
# -----------------------------
# city: 2
# year: 2001
# [1] 0 70 125
# -----------------------------
# city: 1
# year: 2002
# [1] 0 30 85
# -----------------------------
# city: 2
# year: 2002
# [1] 0 75 85
In contrast to using by() as I've suggested, using match() to create
indices that flag when a new `city/year' category is encountered seems a
more explicit, secure way to do the calculation. Adapting an earlier
solution provided in this thread:
year.city <- with(d2, interaction(year, city))
indexOfFirstYearCity <- match(year.city, year.city)
indexOfFirstYearCity
# [1] 1 1 1 4 4 4 7 7 7 10 10 10
d2$diff <- d2$num - d2$num[indexOfFirstYearCity]
d2
city year num diff
1 1 2001 25 0
2 1 2001 75 50
3 1 2001 150 125
4 1 2002 35 0
5 1 2002 65 30
6 1 2002 120 85
7 2 2001 25 0
8 2 2001 95 70
9 2 2001 150 125
10 2 2002 35 0
11 2 2002 110 75
12 2 2002 120 85
Philip
On 29/10/2016 3:15 PM, Jeff Newmiller wrote:
> Now would be an excellent time to read the help page for ?ave. You can specify multiple grouping variables.
More information about the R-help
mailing list