[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

Lopez, Dan lopez235 at llnl.gov
Tue Nov 12 01:41:28 CET 2013


Hi Gabor,

This is a great solution!  I will use it.

Thank you!

Dan


-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] 
Sent: Monday, November 11, 2013 3:02 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?

On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
> Hi R Experts,
>
> How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe?
>
> I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1.  In my example below this is rows 4, 7 and 11.
> My data looks something like this:
>     FY ID  TT
> 1  FY09  1  HC
> 2  FY10  1  HC
> 3  FY11  1  HC
> 4  FY12  1  HC
> 5  FY12  1 TER
> 6  FY09  2  HC
> 7  FY10  2  HC
> 8  FY10  2 TER
> 9  FY11  2  HC
> 10 FY12  2  HC
> 11 FY13  2  HC
> 12 FY13  2 TER
>
> I know for this specific example I can use:
> HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T)
>
> However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data.
> I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1.  Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped.
>
> #REPRODUCIBLE EXAMPLE
> FY<-factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","
> FY11","FY12","FY13","FY13"))
> ID<-c(rep(1,5),rep(2,7))
> TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER"))
> HTDF<-data.frame(FY,ID,TT)
>
> #Summarize data and get max TT. TT is a binary factor variable
> library(sqldf)
> HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF GROUP BY 
> ID,FY')
>
> # Initiate new variable and assign 0 or 1
> HTDF$EXCL<-0
>
> # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
> HTDF$EXCL<-ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==H
> TDF.MAX$MAXTT,0,1)

For each FY, ID group ave applies f to TT == 'TER' returning a logical vector that is TRUE for each HC if TER is in the group
ad otherwise FALSE.   Finally we add 0 to convert from
TRUE/FALSE to 1/0.

The rows of HTDF need not be in any specific order and their oreder will be preserved.

> f <- function(x) any(x) & !x
> transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0)
     FY ID  TT EXCL
1  FY09  1  HC    0
2  FY10  1  HC    0
3  FY11  1  HC    0
4  FY12  1  HC    1
5  FY12  1 TER    0
6  FY09  2  HC    0
7  FY10  2  HC    1
8  FY10  2 TER    0
9  FY11  2  HC    0
10 FY12  2  HC    0
11 FY13  2  HC    1
12 FY13  2 TER    0


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list