[R] Yearly aggregates and matrices
mathijsdevaan
mathijsdevaan at gmail.com
Wed Apr 27 20:03:10 CEST 2011
Hi,
Is there an alternative to "z <- read.zoo(DF, split = 2, index = 3, FUN =
identity)" and "r <- rollapply(z, 3, sum.na, align = "right", partial =
TRUE)"? I am trying to use the following script in which the split data (B)
contains about 300000 unique cases and obviously I am getting an allocation
error. Thanks!
# devel version of zoo
install.packages("zoo", repos = "http://r-forge.r-project.org")
DF = data.frame(read.table(textConnection(" A B C D E F
1 a 1995 0 4 1
2 a 1997 1 1 3
3 b 1995 3 7 0
4 b 1996 1 2 3
5 b 1997 1 2 3
6 b 1998 6 0 0
7 b 1999 3 7 0
8 c 1997 1 2 3
9 c 1998 1 2 3
10 c 1999 6 0 0
11 d 1999 3 7 0
12 e 1995 1 2 3
13 e 1998 1 2 3
14 e 1999 6 0 0"),head=TRUE,stringsAsFactors=FALSE))
library(zoo)
z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
r <- rollapply(z, 3, sum.na, align = "right", partial = TRUE)
newDF <- lapply(1:nrow(r), function(i)
prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
names(newDF) <- time(z)
lapply(newDF, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))
Gabor Grothendieck wrote:
>
> On Sat, Apr 9, 2011 at 5:14 AM, mathijsdevaan
> <mathijsdevaan at gmail.com> wrote:
>> Hi,
>>
>> I need to perform calculations on subsets of a data frame:
>>
>> DF = data.frame(read.table(textConnection(" A B C D E F
>> 1 a 1995 0 4 1
>> 2 a 1997 1 1 3
>> 3 b 1995 3 7 0
>> 4 b 1996 1 2 3
>> 5 b 1997 1 2 3
>> 6 b 1998 6 0 0
>> 7 b 1999 3 7 0
>> 8 c 1997 1 2 3
>> 9 c 1998 1 2 3
>> 10 c 1999 6 0 0
>> 11 d 1999 3 7 0
>> 12 e 1995 1 2 3
>> 13 e 1998 1 2 3
>> 14 e 1999 6 0 0"),head=TRUE,stringsAsFactors=FALSE))
>>
>> I'd like to create new dataframes for each unique year in which for each
>> value of A, the values of D, E and F are summed over the last 3 years
>> (e.g.
>> 1998 = 1998, 1997, 1996):
>> Question 1: How do I go from DF to newDFyear?
>>
>> Examples:
>>
>> newDF1995
>> B D E F
>> a 0 4 1
>> b 3 7 0
>> e 1 2 3
>>
>> newDF1998
>> B D E F
>> a 1 1 3
>> b 8 4 6
>> c 2 4 6
>> e 1 2 3
>>
>> Then, for each new DF I need to generate a square matrix after doing the
>> following:
>>
>> newDF1998$G<-newDF1998$D + newDF1998$E + newDF1998$F
>> newDF1998$D<-newDF1998$D/newDF1998$G
>> newDF1998$E<-newDF1998$E/newDF1998$G
>> newDF1998$F<-newDF1998$F/newDF1998$G
>> newDF1998<-NewDF1998[,c(-5)]
>>
>> newDF1998
>> B D E F
>> a 0.2 0.2 0.6
>> b 0.4 0.2 0.3
>> c 0.2 0.3 0.5
>> e 0.2 0.3 0.5
>>
>> Question 2: How do I go from newDF1998 to a matrix
>>
>> a b c e
>> a
>> b
>> c
>> e
>>
>> in which Cell ab = (0.2*0.4 + 0.2*0.2 + 0.6*0.3)/((0.2*0.2 + 0.2*0.2 +
>> 0.6*0.6)^0.5) * ((0.4*0.4 + 0.2*0.2 + 0.3*0.3)^0.5) = 0.84
>
> First we use read.zoo to reform DF into a multivariate time series and
> use rollapply (where we have used the devel version of zoo since it
> supports the partial= argument on rollapply). We then reform each
> resulting row into a matrix converting each row of each matrix to
> proportions. Finally we form the desired scaled cross product.
>
> # devel version of zoo
> install.packages("zoo", repos = "http://r-forge.r-project.org")
> library(zoo)
>
> z <- read.zoo(DF, split = 2, index = 3, FUN = identity)
>
> sum.na <- function(x) if (any(!is.na(x))) sum(x, na.rm = TRUE) else NA
> r <- rollapply(z, 3, sum.na, align = "right", partial = TRUE)
>
> newDF <- lapply(1:nrow(r), function(i)
> prop.table(na.omit(matrix(r[i,], nc = 4, byrow = TRUE,
> dimnames = list(unique(DF$B), names(DF)[-2:-3]))[, -1]), 1))
> names(newDF) <- time(z)
>
> lapply(mats, function(mat) tcrossprod(mat / sqrt(rowSums(mat^2))))
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
View this message in context: http://r.789695.n4.nabble.com/Yearly-aggregates-and-matrices-tp3438140p3478997.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list