[R] Duplicated function with conditional statement
arun
smartpink111 at yahoo.com
Sun Jul 28 03:11:24 CEST 2013
HI,
May be this is what you wanted.
#using tt1
indx<-which(tt1$response=="buy")
tt1$newcolumn<-0
tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx)) seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else seq(indx[i]+1,indx[i+1]-1);x2<- tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<- subset(x2,response=="buy"); x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0) row.names(x3)[x3$product%in% x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1
tt1
subj response product newcolumn
1 1 sample 1 0
2 1 sample 2 0
3 1 buy 3 0
4 2 sample 2 0
5 2 buy 2 0
6 3 sample 3 1
7 3 sample 2 1
8 3 buy 1 0
9 4 sample 1 1
10 4 buy 4 0
11 5 buy 4 1
12 5 sample 2 1
13 5 buy 2 1
14 6 buy 4 1
15 6 sample 5 0
16 6 sample 5 0
17 7 sample 4 1
18 7 buy 3 1
19 7 buy 4 1
20 8 buy 5 0
21 8 sample 4 1
22 8 buy 2 1
A.K.
________________________________
From: vanessa van der vaart <vanessa.vaart at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: David Winsemius <dwinsemius at comcast.net>; R help <r-help at r-project.org>
Sent: Saturday, July 27, 2013 6:55 PM
Subject: Re: [R] Duplicated function with conditional statement
Dear all,,
thank you all for your help..Its been such a help but its not really exactly what I am looking for. Apparently I havent explained the condition very clearly. I hope this can works.
If the data on column product is duplicated from the previous row, (its applied for response==buy and ==sample) , and it is duplicated from the row which has the value on column 'response'== buy, than the value = 1, otherwise is =0.
so in that case,
if the value is duplicated but it is duplicated from the previous row where the value of resonse==sample, than it is not considered duplicated, and in the new column is 0
thank you very much in advance,
I really appreciated
On Sat, Jul 27, 2013 at 3:45 AM, arun <smartpink111 at yahoo.com> wrote:
>
>On some slightly different datasets:
>tt1<-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
>1L, 2L, 1L), .Label = c("buy", "sample"), class = "factor"),
> product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5,
> 5, 4, 3, 4, 5, 4, 2)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>tt2<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
>1L, 2L, 2L), .Label = c("buy", "sample"), class = "factor"),
> product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4,
> 2, 3, 3, 2, 5, 3, 4)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>tt3<- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5,
>6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L,
>2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
>1L, 1L, 2L), .Label = c("buy", "sample"), class = "factor"),
> product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2,
> 2, 2, 2, 4, 3, 2, 5)), .Names = c("subj", "response", "product"
>), class = "data.frame", row.names = c(NA, 22L))
>
>
>#Tried David's solution:
>tt1$rown <- rownames(tt1)
>as.numeric ( apply(tt1, 1, function(x) {
> x['product'] %in% tt1[ rownames(tt1) < x['rown'] & tt1$response == "buy", "product"] } ) )
> #gave inconsistent results especially since the first 10 rows were from `tt`
># [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1
>
>#similarly for `tt2` and `tt3`.
>
>
>##Created this function. It seems to work in the tested cases, though it is not tested extensively.
>fun1<- function(dat,colName,newColumn){
> indx<- which(dat[,colName]=="buy")
> dat[,newColumn]<-0
> dat[unlist(lapply(seq_along(indx),function(i){
> x1<- if(i==length(indx)){
> seq(indx[i],nrow(dat))
> }
> else if((indx[i+1]-indx[i])==1){
> indx[i]
> }
> else {
> seq(indx[i]+1,indx[i+1]-1)
> }
> x2<- dat[unique(c(indx[i:1],x1)),]
> x3<- subset(x2,response=="sample")
> x4<- subset(x2,response=="buy")
> if(nrow(x3)!=0) {
> row.names(x3)[x3$product%in% x4$product]
> }
>
> })),newColumn]<-1
> dat
>
> }
>fun1(tt,"response","newCol")
># subj response product rown newCol
>#1 1 sample 1 1 0
>#2 1 sample 2 2 0
>#3 1 buy 3 3 0
>#4 2 sample 2 4 0
>#5 2 buy 2 5 0
>#6 3 sample 3 6 1
>#7 3 sample 2 7 1
>#8 3 buy 1 8 0
>#9 4 sample 1 9 1
>#10 4 buy 4 10 0
>
>fun1(tt1,"response","newCol")
># subj response product newCol
>#1 1 sample 1 0
>#2 1 sample 2 0
>#3 1 buy 3 0
>#4 2 sample 2 0
>#5 2 buy 2 0
>#6 3 sample 3 1
>#7 3 sample 2 1
>#8 3 buy 1 0
>#9 4 sample 1 1
>#10 4 buy 4 0
>#11 5 buy 4 0
>#12 5 sample 2 1
>#13 5 buy 2 0
>#14 6 buy 4 0
>#15 6 sample 5 0
>#16 6 sample 5 0
>#17 7 sample 4 1
>#18 7 buy 3 0
>#19 7 buy 4 0
>#20 8 buy 5 0
>#21 8 sample 4 1
>#22 8 buy 2 0
>#Also
> fun1(tt2,"response","newCol")
> fun1(tt3,"response","newCol")
>A.K.
>
>P.S. Below is OP's clarification regarding the conditional statement in a private message:
>
>I am sorry i didnt question it very clearly, let me change the
>conditional statement, I hope you can understand. i will explain by
>example
>
>as you can see, almost every number is duplicated, but only in row 6th,7th,and 9th the value on column is 1.
>
>on row4th, the value is duplicated( 2 already occurred on 2nd row),but
>since the value is considered as duplicated only if the value is
>duplicated where the response is 'buy' than the value on column, on
>row4th still zero.
>
>On row 6th, where the value product column is 3. 3 is already occurred
>in 3rd row where the value on response is 'buy', so the value on column
>should be 1
>
>I hope it can understand the conditional statement.
>
>
>
>
>
>
>
>
>
>----- Original Message -----
>From: David Winsemius <dwinsemius at comcast.net>
>To: David Winsemius <dwinsemius at comcast.net>
>Cc: R-help at r-project.org; Uwe Ligges <ligges at statistik.tu-dortmund.de>
>Sent: Friday, July 26, 2013 5:16 PM
>Subject: Re: [R] Duplicated function with conditional statement
>
>
>On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:
>
>>
>> On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
>>
>>>
>>>
>>> On 25.07.2013 21:05, vanessa van der vaart wrote:
>>>> Hi everybody,,
>>>> I have a question about R function duplicated(). I have spent days try to
>>>> figure this out,but I cant find any solution yet. I hope somebody can help
>>>> me..
>>>> this is my data:
>>>>
>>>> subj=c(1,1,1,2,2,3,3,3,4,4)
>>>> response=c('sample','sample','buy','sample','buy','sample','
>>>> sample','buy','sample','buy')
>>>> product=c(1,2,3,2,2,3,2,1,1,4)
>>>> tt=data.frame(subj, response, product)
>>>>
>>>> the data look like this:
>>>>
>>>> subj response product
>>>> 1 1 sample 1
>>>> 2 1 sample 2
>>>> 3 1 buy 3
>>>> 4 2 sample 2
>>>> 5 2 buy 2
>>>> 6 3 sample 3
>>>> 7 3 sample 2
>>>> 8 3 buy 1
>>>> 9 4 sample 1
>>>> 10 4 buy 4
>>>>
>>>> I want to create new column based on the value on response and product
>>>> column. if the value on product is duplicated, then the value on new column
>>>> is 1, otherwise is 0.
>>>
>>>
>>> According to your description:
>>>
>>
>> Agree that the description did not match the output. I tried to match the output using a rule that could be expressed as:
>>
>> if( a "buy"- associated "product" value precedes the current "product" value){1}else{0}
>>
>
>So this delivers the specified output:
>
>tt$rown <- rownames(tt)
>as.numeric ( apply(tt, 1, function(x) {
> x['product'] %in% tt[ rownames(tt) < x['rown'] & tt$response == "buy", "product"] } ) )
>
># [1] 0 0 0 0 0 1 1 0 1 0
>
>> --
>> David.
>>
>>> tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy")
>>>
>>> which is different from what you show us below, where I cannot derive any systematic rule from.
>>>
>>> Uwe Ligges
>>>
>>>> but I want to add conditional statement that the value on product column
>>>> will only be considered as duplicated if the value on response column is
>>>> 'buy'.
>>>> for illustration, the table should look like this:
>>>>
>>>> subj response product newcolumn
>>>> 1 1 sample 1 0
>>>> 2 1 sample 2 0
>>>> 3 1 buy 3 0
>>>> 4 2 sample 2 0
>>>> 5 2 buy 2 0
>>>> 6 3 sample 3 1
>>>> 7 3 sample 2 1
>>>> 8 3 buy 1 0
>>>> 9 4 sample 1 1
>>>> 10 4 buy 4 0
>>>>
>>>>
>>>> can somebody help me?
>>>> any help will be appreciated.
>>>> I am new in this mailing list, so forgive me in advance, If I did not ask
>>>> the question appropriately.
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>David Winsemius
>Alameda, CA, USA
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list