[R] Capture change in variable in R

David Winsemius dwinsemius at comcast.net
Thu May 8 21:52:44 CEST 2014


On May 8, 2014, at 9:49 AM, Abhinaba Roy wrote:

> Hi R helpers,
> 
> I have a dataframe like
> 
>   ID                   Yr_Mnth AMT_PAID AMT_DUE    paidToDue
> CS00000026A    201301      320.48      1904    0.168319328
> CS00000026A    201302    4881.31    15708    0.310753119
> CS00000026A    201303    7609.04    25585    0.297402384
> CS00000026A    201304    9782.70    21896    0.446780234
> CS00000026A    201305    6482.01    22015    0.294436066
> CS00000026A    201306    5226.28    14280    0.365985994
> CS00000026A    201307    9078.47    19040    0.476810399
> CS00000026A    201308    7060.33    23800    0.296652521
> CS00000026A    201309    7595.57    17136    0.443252218
> CS00000026A    201310    5388.64    24752    0.217705236
> 
> The problem I am facing is to capture the change in 'paidToDue' which is
> define as follows
> 
> Let 'm' be the value of 'Yr_Mnth' in the current row (except the 1st row)
> and 'm-1' be that in the previous row
> 
> I am trying to add a column to the dataframe 'Change' which will have
> values 'Improve','Deteriorate' and 'No change', which are defined as
> 
> 
> if (AMT_PAID(m) != AMT_PAID(m-1)) & sign(paidToDue(m)-paidToDue(m-1)==1 &
> abs(paidToDue(m)-paidToDue(m-1))>0.1 then 'Change' = 'Improve'

There is a `diff` function that may make this all much simpler:

You could translate   (AMT_PAID[m] != AMT_PAID[(m-1])   to

  diff(AMT_PAID) != 0   # length is 1 shorter than the input vector

And sign(paidToDue[m]-paidToDue[m-1] ) ==1  to

  diff(paidToDue) > 0   # can pad with c(NA, ...)

From your incorrect use of parentheses for indexing, I'm guessing you are very new to R programming. You also attempted to paste a CSV file and that was rejected by the mail-server which only accepts MIME-text formatted files. Despite the fact that most csv files really are text files, they often get labeled differently by posters' mail clients.


> if (AMT_PAID(m) != AMT_PAID(m-1)) & sign(paidToDue(m)-paidToDue(m-1) == -1
> & abs(paidToDue(m)-paidToDue(m-1)) > 0.1 then 'Change' = 'Deteriorate'
> 
> else 'Change' = 'No change'

If this were just a matter of differences in 'paidToDue' within values of ID, then it would be as simple as:

dat$Change <- with( dat, ave( paidToDue, ID, FUN=function(x){
     c(NA, c('Deteriorate', 'No change', 'Improve)[findInterval(x, c(-Inf, -0.1, 0.1, Inf) )] ) } ) )
> 
> 
> Note: I have 5000 unique ID in the data and this has to be done for each ID
> and the data is sorted by Yr_Mnth.

When you need to use multiple columns as input and work across rows I generally use an lapply( split(), fun)-strategy.

> 
> Please find attached the csv file for reference.
> 
> How can it be done in R?


It's not going to be terribly difficult, but I'm concerned this is homework, so not trying for a complete solution. You have not done very much in the way of setting the context.


> -- 
> Regards
> Abhinaba Roy
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list