[R] DATA SUMMARIZING and REPORTING

arun smartpink111 at yahoo.com
Thu Jul 31 06:34:36 CEST 2014


For the example, you gave:

x ##dataset

indx <- t(sapply(min(x$MTH_SUPPORT):(max(x$MTH_SUPPORT) - 2), function(x) c(x, x + 
    2)))

res <- do.call(rbind, apply(indx, 1, function(.indx) {
    x1 <- x[x$MTH_SUPPORT >= .indx[1] & x$MTH_SUPPORT <= .indx[2], ]
    Period <- paste(.indx[1], .indx[2], sep = "-")
    No.ofChange <- sum(x1$ATT_1[-1] != x1$ATT_1[-length(x1$ATT_1)])
    Paid = with(x1, sum(A3)/(sum(A1) + sum(A2)))
    data.frame(ID_CASE = x$ID_CASE[1L], Period, No.ofChange, Paid, stringsAsFactors = F)
}))


 res
  ID_CASE        Period No.ofChange      Paid
1   CB26A 201302-201304           2 0.4143646
2   CB26A 201303-201305           2 0.4452450
3   CB26A 201304-201306           1 0.4444444
4   CB26A 201305-201307           2 0.4607407
5   CB26A 201306-201308           1 0.4617737
6   CB26A 201307-201309           1 0.4513274
7   CB26A 201308-201310           1 0.4613779


With multiple ID_CASE, either split the dataset by ID_CASE or on the grouping functions before applying this.


A.K.




On Wednesday, July 30, 2014 8:48 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
Hi R-helpers,

I have dataframe like

  ID_CASE         YEAR_MTH       ATT_1             A1              A2
A3  CB26A 201302 1 146 42 74  CB26A 201302 0 140 50 77  CB26A 201303 0 128
36 77  CB26A 201304 1 146 36 72  CB26A 201305 1 134 36 80  CB26A 201305 0
148 30 80  CB26A 201306 0 134 20 72  CB26A 201307 1 125 48 79  CB26A 201309
0 122 44 74  CB26A 201310 1 126 37 72  CB26A 201310 1 107 43 75
I want a final dataframe which will look like

  ID_CASE Period  No.ofChange      %Paid  CB26A 201302-2013042  0.414365
CB26A 201303-201305 2 0.445245  CB26A 201304-201306 1 0.444444  CB26A
201305-201307 2 0.460741  CB26A 201306-201308 1 0.461774  CB26A
201307-201309 1 0.451327  CB26A 201308-201310 1 0.461378
where,
Period = a time period of 3 months which is shifted by 1 month subsequently

No.ofChange = number of time ATT_1 has changed values in this period

%Paid = sum(A3)/(sum(A1)+sum(A2)) for this period
E.g. for Period=201302-201304,
%Paid = (74+77+77+72)/((146+140+128+146)+(42+50+36+36))

Period calculation should start from the first YEAR_MTH for the ID_CASE,
i.e., if for a ID_CASE first YEAR_MTH is 201301 or 201304 then the period
should be defined accordingly.

I have a dataframe with 400 unique ID_CASE, I need to do it for all ID_CASE.

How can I do it in R?

Regards,
Abhinaba

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list