[R] DATA SUMMARIZING and REPORTING
arun
smartpink111 at yahoo.com
Thu Jul 31 06:51:52 CEST 2014
With >1 ID_CASE, you may try:
xN <- x
xN$ID_CASE <- "CB27A" #creating another ID_CASE, other data same
x <- rbind(x, xN)
res1 <- do.call(rbind, lapply(split(x, x$ID_CASE), function(.x) {
indx <- with(.x, t(sapply(min(MTH_SUPPORT):(max(MTH_SUPPORT) - 2), function(y) c(y,
y + 2))))
do.call(rbind, apply(indx, 1, function(.indx) {
x1 <- .x[with(.x, MTH_SUPPORT >= .indx[1] & MTH_SUPPORT <= .indx[2]), ]
Period <- paste(.indx[1], .indx[2], sep = "-")
x2 <- within(x1, {
Paid <- sum(A3)/(sum(A1) + sum(A2))
No.ofChange <- sum(ATT_1[-1] != ATT_1[-length(ATT_1)])
})
data.frame(ID_CASE = .x$ID_CASE[1L], Period, No.ofChange = x2$No.ofChange[1L],
Paid = x2$Paid[1L], stringsAsFactors = F)
}))
}))
row.names(res1) <- 1:nrow(res1)
> res1
ID_CASE Period No.ofChange Paid
1 CB26A 201302-201304 2 0.4143646
2 CB26A 201303-201305 2 0.4452450
3 CB26A 201304-201306 1 0.4444444
4 CB26A 201305-201307 2 0.4607407
5 CB26A 201306-201308 1 0.4617737
6 CB26A 201307-201309 1 0.4513274
7 CB26A 201308-201310 1 0.4613779
8 CB27A 201302-201304 2 0.4143646
9 CB27A 201303-201305 2 0.4452450
10 CB27A 201304-201306 1 0.4444444
11 CB27A 201305-201307 2 0.4607407
12 CB27A 201306-201308 1 0.4617737
13 CB27A 201307-201309 1 0.4513274
14 CB27A 201308-201310 1 0.4613779
A.K.
On Thursday, July 31, 2014 12:34 AM, arun <smartpink111 at yahoo.com> wrote:
For the example, you gave:
x ##dataset
indx <- t(sapply(min(x$MTH_SUPPORT):(max(x$MTH_SUPPORT) - 2), function(x) c(x, x +
2)))
res <- do.call(rbind, apply(indx, 1, function(.indx) {
x1 <- x[x$MTH_SUPPORT >= .indx[1] & x$MTH_SUPPORT <= .indx[2], ]
Period <- paste(.indx[1], .indx[2], sep = "-")
No.ofChange <- sum(x1$ATT_1[-1] != x1$ATT_1[-length(x1$ATT_1)])
Paid = with(x1, sum(A3)/(sum(A1) + sum(A2)))
data.frame(ID_CASE = x$ID_CASE[1L], Period, No.ofChange, Paid, stringsAsFactors = F)
}))
res
ID_CASE Period No.ofChange Paid
1 CB26A 201302-201304 2 0.4143646
2 CB26A 201303-201305 2 0.4452450
3 CB26A 201304-201306 1 0.4444444
4 CB26A 201305-201307 2 0.4607407
5 CB26A 201306-201308 1 0.4617737
6 CB26A 201307-201309 1 0.4513274
7 CB26A 201308-201310 1 0.4613779
With multiple ID_CASE, either split the dataset by ID_CASE or on the grouping functions before applying this.
A.K.
On Wednesday, July 30, 2014 8:48 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
Hi R-helpers,
I have dataframe like
ID_CASE YEAR_MTH ATT_1 A1 A2
A3 CB26A 201302 1 146 42 74 CB26A 201302 0 140 50 77 CB26A 201303 0 128
36 77 CB26A 201304 1 146 36 72 CB26A 201305 1 134 36 80 CB26A 201305 0
148 30 80 CB26A 201306 0 134 20 72 CB26A 201307 1 125 48 79 CB26A 201309
0 122 44 74 CB26A 201310 1 126 37 72 CB26A 201310 1 107 43 75
I want a final dataframe which will look like
ID_CASE Period No.ofChange %Paid CB26A 201302-2013042 0.414365
CB26A 201303-201305 2 0.445245 CB26A 201304-201306 1 0.444444 CB26A
201305-201307 2 0.460741 CB26A 201306-201308 1 0.461774 CB26A
201307-201309 1 0.451327 CB26A 201308-201310 1 0.461378
where,
Period = a time period of 3 months which is shifted by 1 month subsequently
No.ofChange = number of time ATT_1 has changed values in this period
%Paid = sum(A3)/(sum(A1)+sum(A2)) for this period
E.g. for Period=201302-201304,
%Paid = (74+77+77+72)/((146+140+128+146)+(42+50+36+36))
Period calculation should start from the first YEAR_MTH for the ID_CASE,
i.e., if for a ID_CASE first YEAR_MTH is 201301 or 201304 then the period
should be defined accordingly.
I have a dataframe with 400 unique ID_CASE, I need to do it for all ID_CASE.
How can I do it in R?
Regards,
Abhinaba
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list