[R] Manipulating dataframes

arun smartpink111 at yahoo.com
Thu May 15 17:34:23 CEST 2014



Hi,

May be this helps:

dat <- read.table(text="ID    YEAR_MONTH    ATT_1    ATT_2
1    201301                   Y          1
1    201302                   Y          1
1    201302                   N          0
1    201302                   Y          0
1    201303                   N          1
3    201301                   N          1
3    201302                   N          0
3    201302                   Y          0
3    201302                   Y          1
3    201303                   Y          1",sep="",header=TRUE,stringsAsFactors=FALSE)


res <- read.table(text="ID    YEAR_MONTH    YEARMONTH_LAG1    ATT1_CHNG  ATT2_CHNG
1      201301                   NA                                   NA          NA
1      201302                201301                                0            0
1      201303                201302                                2            1
3      201301                  NA                                    NA         NA
3      201302                201301                                0           0
3      201303                201302                                1           1",sep="",header=TRUE) 


dat$YEARMONTH_LAG1 <- with(dat, ave(YEAR_MONTH, ID, FUN = function(x) c(NA, head(x, 
    -1))))

res1 <- unsplit(lapply(split(dat, list(dat$ID, dat$YEAR_MONTH), drop = FALSE), function(x) {
    within(x, {
        ATT2_CHNG <- sum(ATT_2[-1] != ATT_2[-nrow(x)])
        ATT1_CHNG <- sum(ATT_1[-1] != ATT_1[-nrow(x)])
    })
    
}), list(dat$ID, dat$YEAR_MONTH))
res2 <- res1[!with(res1, ave(YEAR_MONTH, list(ID), FUN = function(x) duplicated(x))), 
    c(1:2, 5:7)]
res2[, 4:5] <- lapply(res2[, 4:5], function(x) ave(x, res2$ID, FUN = function(x) c(NA, 
    x[-length(x)])))
row.names(res2) <- 1:nrow(res2)
identical(res, res2)
#[1] TRUE

A.K.


On Thursday, May 15, 2014 11:31 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
Hi R-helpers,

I have a dataframe as shown below

ID    YEAR_MONTH    ATT_1    ATT_2
1    201301                   Y          1
1    201302                   Y          1
1    201302                   N          0
1    201302                   Y          0
1    201303                   N          1
3    201301                   N          1
3    201302                   N          0
3    201302                   Y          0
3    201302                   Y          1
3    201303                   Y          1

The final dataframe I want should look like

ID    YEAR_MONTH    YEARMONTH_LAG1    ATT1_CHNG  ATT2_CHNG
1      201301                   NA                                   NA
         NA
1      201302                201301                                0
           0
1      201303                201302                                2
           1
3      201301                  NA                                    NA
         NA
3      201302                201301                                0
           0
3      201303                201302                                1
           1

Note:
1. 'YEARMONTH_LAG1 ' is the previous month corresponding to the current
month. E.g., if YEAR_MONTH == 201301 then YEARMONTH_LAG1 = NA (as there is
no record for 201212, and there is none in my data as all starts from
201201). Similarly if YEAR_MONTH == 201302 then YEARMONTH_LAG1 = 201301.

2. ATT1_CHNG is the number of times the level has changed (i.e., from Y to
N and vice-verse) for 'ATT_1' in the previous month (i.e., in
YEARMONTH_LAG1)

3. ATT2_CHNG is the number of times the level has changed (i.e., from 0 to
1 and vice-verse) for ATT_2 in the previous month

How can this be done in R?
-- 
Regards
Abhinaba Roy

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list