# [R] Yearly aggregates and matrices

Gabor Grothendieck ggrothendieck at gmail.com
Sun Apr 10 05:48:23 CEST 2011

```On Sat, Apr 9, 2011 at 11:45 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan <mathijsdevaan at gmail.com> wrote:
>> Hi,
>>
>> I need to perform calculations on subsets of a data frame:
>>
>> DF = data.frame(read.table(textConnection("    A  B  C  D  E  F
>> 1 a  1995  0  4  1
>> 2 a  1997  1  1  3
>> 3 b  1995  3  7  0
>> 4 b  1996  1  2  3
>> 5 b  1997  1  2  3
>> 6 b  1998  6  0  0
>> 7 b  1999  3  7  0
>> 8 c  1997  1  2  3
>> 9 c  1998  1  2  3
>> 10 c  1999  6  0  0
>> 11 d  1999  3  7  0
>> 12 e  1995  1  2  3
>> 13 e  1998  1  2  3
>> 14 e  1999  6  0  0"),head=TRUE,stringsAsFactors=FALSE))
>>
>> I'd like to create new dataframes for each unique year in which for each
>> value of A, the values of D, E and F are summed over the last 3 years (e.g.
>> 1998 = 1998, 1997, 1996):
>> Question 1: How do I go from DF to newDFyear?
>>
>> Examples:
>>
>> newDF1995
>> B  D  E  F
>> a  0  4  1
>> b  3  7  0
>> e  1  2  3
>>
>> newDF1998
>> B  D  E  F
>> a  1  1  3
>> b  8  4  6
>> c  2  4  6
>> e  1  2  3
>>
>> Then, for each new DF I need to generate a square matrix after doing the
>> following:
>>
>> newDF1998\$G<-newDF1998\$D + newDF1998\$E + newDF1998\$F
>> newDF1998\$D<-newDF1998\$D/newDF1998\$G
>> newDF1998\$E<-newDF1998\$E/newDF1998\$G
>> newDF1998\$F<-newDF1998\$F/newDF1998\$G
>> newDF1998<-NewDF1998[,c(-5)]
>>
>> newDF1998
>> B  D  E  F
>> a  0.2  0.2  0.6
>> b  0.4  0.2  0.3
>> c  0.2  0.3  0.5
>> e  0.2  0.3  0.5
>>
>> Question 2: How do I go from newDF1998 to a matrix
>>
>>  a  b  c  e
>> a
>> b
>> c
>> e
>>
>> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 +
>> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84
>
> First we use read.zoo to reform DF into a multivariate time series and
> use rollapply (where we have used the devel version of zoo since it
> supports the partial= argument on rollapply).  We then reform each
> resulting row into a matrix converting each row of each matrix to
> proportions.  Finally we form the desired scaled cross product.
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
>
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(z, 3,  sum.na, align = "right", partial = TRUE)
>
> newDF <- lapply(1:nrow(r), function(i)
>        prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
>                dimnames = list(unique(DF\$B), names(DF)[-2:-3]))[, -1]), 1))
> names(newDF) <- time(z)
>
> lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))

mats in the last line should be newDF:

lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

```