[R] Duplicated function with conditional statement

arun smartpink111 at yahoo.com
Sat Jul 27 04:45:14 CEST 2013



On some slightly different datasets:
tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5, 
    5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product"
), class = "data.frame", row.names = c(NA, 22L))

tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4, 
    2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product"
), class = "data.frame", row.names = c(NA, 22L))

tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2, 
    2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product"
), class = "data.frame", row.names = c(NA, 22L))


#Tried David's solution:
tt1$rown <- rownames(tt1)
as.numeric ( apply(tt1, 1, function(x) {
    x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", "product"]  } ) )
  #gave inconsistent results especially since the first 10 rows were from `tt`
# [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1

#similarly for `tt2` and `tt3`.


##Created this function.  It seems to work in the tested cases, though it is not tested extensively.
fun1<- function(dat,colName,newColumn){
      indx<- which(dat[,colName]=="buy")
      dat[,newColumn]<-0
      dat[unlist(lapply(seq_along(indx),function(i){
            x1<- if(i==length(indx)){
                seq(indx[i],nrow(dat))
             }
            else if((indx[i+1]-indx[i])==1){
            indx[i]
            }
            else {
            seq(indx[i]+1,indx[i+1]-1)
             }
            x2<- dat[unique(c(indx[i:1],x1)),]
            x3<- subset(x2,response=="sample")
            x4<- subset(x2,response=="buy")
            if(nrow(x3)!=0) {
                            row.names(x3)[x3$product%in% x4$product]
                       }
                                    
            })),newColumn]<-1
    dat

    }
fun1(tt,"response","newCol")
#   subj response product rown newCol
#1     1   sample       1    1      0
#2     1   sample       2    2      0
#3     1      buy       3    3      0
#4     2   sample       2    4      0
#5     2      buy       2    5      0
#6     3   sample       3    6      1
#7     3   sample       2    7      1
#8     3      buy       1    8      0
#9     4   sample       1    9      1
#10    4      buy       4   10      0

fun1(tt1,"response","newCol")
#   subj response product newCol
#1     1   sample       1      0
#2     1   sample       2      0
#3     1      buy       3      0
#4     2   sample       2      0
#5     2      buy       2      0
#6     3   sample       3      1
#7     3   sample       2      1
#8     3      buy       1      0
#9     4   sample       1      1
#10    4      buy       4      0
#11    5      buy       4      0
#12    5   sample       2      1
#13    5      buy       2      0
#14    6      buy       4      0
#15    6   sample       5      0
#16    6   sample       5      0
#17    7   sample       4      1
#18    7      buy       3      0
#19    7      buy       4      0
#20    8      buy       5      0
#21    8   sample       4      1
#22    8      buy       2      0
#Also
 fun1(tt2,"response","newCol")
 fun1(tt3,"response","newCol")
A.K.

P.S.  Below is OP's clarification regarding the conditional statement in a private message:

I am sorry i didnt question it very clearly, let me change the 
conditional statement, I hope you can understand. i will explain by 
example

as you can see, almost every number is duplicated, but only in row 6th,7th,and 9th the value on column is 1.

on row4th, the value is duplicated( 2 already occurred on 2nd row),but 
since the value is considered as duplicated only if the value is 
duplicated where the response is 'buy' than the value on column, on 
row4th still zero. 

On row 6th, where the value product column is 3. 3 is already occurred 
in 3rd row where the value on response is 'buy', so the value on column 
should be 1

I hope it can understand the conditional statement. 








----- Original Message -----
From: David Winsemius <dwinsemius at comcast.net>
To: David Winsemius <dwinsemius at comcast.net>
Cc: R-help at r-project.org; Uwe Ligges <ligges at statistik.tu-dortmund.de>
Sent: Friday, July 26, 2013 5:16 PM
Subject: Re: [R] Duplicated function with conditional statement


On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:

> 
> On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
> 
>> 
>> 
>> On 25.07.2013 21:05, vanessa van der vaart wrote:
>>> Hi everybody,,
>>> I have a question about R function duplicated(). I have spent days try to
>>> figure this out,but I cant find any solution yet. I hope somebody can help
>>> me..
>>> this is my data:
>>> 
>>> subj=c(1,1,1,2,2,3,3,3,4,4)
>>> response=c('sample','sample','buy','sample','buy','sample','
>>> sample','buy','sample','buy')
>>> product=c(1,2,3,2,2,3,2,1,1,4)
>>> tt=data.frame(subj, response, product)
>>> 
>>> the data look like this:
>>> 
>>> subj response product
>>> 1     1   sample       1
>>> 2     1   sample       2
>>> 3     1      buy          3
>>> 4     2   sample       2
>>> 5     2         buy       2
>>> 6     3   sample       3
>>> 7     3   sample       2
>>> 8     3         buy       1
>>> 9     4  sample       1
>>> 10   4       buy        4
>>> 
>>> I want to create new  column based on the value on response and product
>>> column. if the value on product is duplicated, then  the value on new column
>>> is 1, otherwise is 0.
>> 
>> 
>> According to your description:
>> 
> 
> Agree that the description did not match the output. I tried to match the output using a rule that could be expressed as: 
> 
> if( a "buy"- associated "product" value precedes the current "product" value){1}else{0}
> 

So this delivers the specified output:

tt$rown <- rownames(tt)
as.numeric ( apply(tt, 1, function(x) { 
     x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", "product"]  } ) )

# [1] 0 0 0 0 0 1 1 0 1 0

> -- 
> David.
> 
>> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
>> 
>> which is different from what you show us below, where I cannot derive any systematic rule from.
>> 
>> Uwe Ligges
>> 
>>> but I want to add conditional statement that the value on product column
>>> will only be considered as duplicated if the value on response column is
>>> 'buy'.
>>> for illustration, the table should look like this:
>>> 
>>> subj response product newcolumn
>>> 1     1   sample       1          0
>>> 2     1   sample       2          0
>>> 3     1      buy          3          0
>>> 4     2   sample       2          0
>>> 5     2         buy       2          0
>>> 6     3   sample       3          1
>>> 7     3   sample       2           1
>>> 8     3         buy       1           0
>>> 9     4  sample       1            1
>>> 10   4       buy       4             0
>>> 
>>> 
>>> can somebody help me?
>>> any help will be appreciated.
>>> I am new in this mailing list, so forgive me in advance, If I did not  ask
>>> the question appropriately.
>>> 
>>>     [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list