[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Lopez, Dan
lopez235 at llnl.gov
Tue Nov 12 01:02:20 CET 2013
Thanks.
Dan
-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: Monday, November 11, 2013 2:26 PM
To: R help (r-help at r-project.org)
Cc: Lopez, Dan
Subject: Re: [R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Hi,
You may try:
fun1 <- function(dat){
dat$EXCL3 <- 0
dat$EXCL3[dat$TT=="HC"] <- 1*as.character(interaction(dat[,1:2]))[dat$TT=="HC"] %in% as.character(interaction(dat[,1:2]))[dat$TT=="TER"]
dat
}
fun1(HTDF)
set.seed(14)
indx <- sample(1:nrow(HTDF),12)
HTDF1 <- HTDF[indx,]
fun1(HTDF1)
A.K.
On Monday, November 11, 2013 4:49 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote:
Hi R Experts,
How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe?
I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11.
My data looks something like this:
FY ID TT
1 FY09 1 HC
2 FY10 1 HC
3 FY11 1 HC
4 FY12 1 HC
5 FY12 1 TER
6 FY09 2 HC
7 FY10 2 HC
8 FY10 2 TER
9 FY11 2 HC
10 FY12 2 HC
11 FY13 2 HC
12 FY13 2 TER
I know for this specific example I can use:
HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T)
However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data.
I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped.
#REPRODUCIBLE EXAMPLE
FY<-factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","FY11","FY12","FY13","FY13"))
ID<-c(rep(1,5),rep(2,7))
TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER"))
HTDF<-data.frame(FY,ID,TT)
#Summarize data and get max TT. TT is a binary factor variable
library(sqldf)
HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF GROUP BY ID,FY')
# Initiate new variable and assign 0 or 1
HTDF$EXCL<-0
# THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
HTDF$EXCL<-ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==HTDF.MAX$MAXTT,0,1)
Dan Lopez
Workforce Analyst
LLNL
HRIM - Workforce Analytics & Metrics
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list