[R] How to combine conditional argument and logical argument in R to create subset of data...

arun smartpink111 at yahoo.com
Thu Mar 7 00:13:58 CET 2013


Hi,
I am not sure I understand it correctly.



In the example you gave, there are duplicated rows in Tem1, ie. (222 6 ), (222 7), (333 11), but these rows are also present in Tem2
Is there any chance of triplicates etc..
Also, you wanted to have rows that are not common in Tem1 and Tem2. ie. (111 1) is the first row in both.
indxTem1<-paste0(Tem1[,1],Tem1[,2])
 indxTem2<-paste0(Tem2[,1],Tem2[,2])


 res<-rbind(Tem1[!indxTem1%in%indxTem2,], Tem1[duplicated(Tem1),]) 
res
res
       V1 V2
# [1,] 333 12
 #[2,] 111 16
 #[3,] 111 17
 #[4,] 111 20
 #[5,] 222 21
 #[6,] 222 22
 #[7,] 222 23
 #[8,] 333  4
 #[9,] 333  5
#[10,] 333  6
#[11,] 333  7
#[12,] 222  6
#[13,] 222  7
#[14,] 333 11

In cases of more replicates (triplicates, etc...) how do you want to process.  Also, here the duplicate rows were found only in Tem1.
A.K.

________________________________
From: HJ YAN <yhj204 at googlemail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: r-help at r-project.org 
Sent: Wednesday, March 6, 2013 5:36 PM
Subject: Re: [R] How to combine conditional argument and logical argument in R to create subset of data...


Hi Arun

Massive thanks for the hints of making use of 'paste0'!

But coincidentally there were no pair of data exactly same in indxTem1 and indxTem2 in the previous example. I changed data as below which is very likely to be in my real data...


V1<-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)  # V1 here are some data index with lots of repeated numeric values
V2<-c(1:23, 6,7,11,4,5,6,7)  # there are also duplicated values in V2
Tem1<-cbind(V1,V2)
Tem2<-Tem1[c(1:11,13:15,18:19),] # I know that Tem2 is a subset of Tem1...


And my target outcome is the difference between Tem1 and Tem2 as below:


  V1 V2

 333 12
 111 16
 111 17
 111 20
 222 21
 222 22
 222 23
 222  6
 222  7
 333 11
 333  4
 333  5
 333  6
 333  7

Many thanks
HJ



On Wed, Mar 6, 2013 at 9:29 PM, arun <smartpink111 at yahoo.com> wrote:


>
>Hi,
>How about this:
>
>indxTem1<-paste0(Tem1[,1],Tem1[,2])
> indxTem2<-paste0(Tem2[,1],Tem2[,2])
>Tem1[!indxTem1%in%indxTem2,]
>#       V1 V2
> #[1,] 333 11
> #[2,] 111 16
> #[3,] 111 17
> #[4,] 111 20
> #[5,] 222 21
> #[6,] 222 22
> #[7,] 222 23
> #[8,] 222  1
> #[9,] 222  2
>#[10,] 333  3
>#[11,] 333  4
>#[12,] 333  5
>#[13,] 333  6
>#[14,] 333  7
>
>
>
>A.K.
>________________________________
>From: HJ YAN <yhj204 at googlemail.com>
>To: arun <smartpink111 at yahoo.com>
>Cc: r-help at r-project.org
>Sent: Wednesday, March 6, 2013 4:09 PM
>
>Subject: Re: [R] How to combine conditional argument and logical argument in R to create subset of data...
>
>
>Dear Arun
>
>
>Thanks a million for your prompt reply and I love all four ways in your reply. 
>
>Tried the code and just realised an issue here:   in my real work, my data is about 4GB large and I'm sure that there are many duplicated values in V2, so that is to say my V1 and V2 should be something like
>
>
>V1<-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)  # V1 here are some data index with lots of repeated numeric values
>V2<-c(1:23, 1:7)  # there are also duplicated values in V2
>Tem1<-cbind(V1,V2)
>Tem2<-Tem1[c(1:10,12:15,18:19),] # I know that Tem2 is a subset of Tem1...
>
>
>So how do I get outcome of the difference of Tem1 and Tem2 if the values in V2 having duplicates?
>
>  V1 V2
> 333 11
> 111 16
> 111 17
> 111 20
> 222 21
> 222 22
> 222 23
> 222  1
> 222  2
> 333  3
> 333  4
> 333  5
> 333  6
> 333  7
>
>
>Massive thanks
>HJ
>
>
>
>
>
>On Wed, Mar 6, 2013 at 4:12 PM, arun <smartpink111 at yahoo.com> wrote:
>
>
>>
>>Just to add:
>>
>>Tem1[Tem1[,2]%in%setdiff(Tem1[,2],Tem2[,2]),]
>>
>>A.K.
>>
>>----- Original Message -----
>>
>>From: arun <smartpink111 at yahoo.com>
>>To: HJ YAN <yhj204 at googlemail.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Wednesday, March 6, 2013 11:06 AM
>>Subject: Re: [R] How to combine conditional argument and logical argument in R to create subset of data...
>>
>>Hi,
>>No problem.
>>V1<-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)
>> length(V1)
>>#[1] 30
>>
>> V2<- c(1:30) #should be the same length as V1
>>Tem1<- cbind(V1,V2)
>>Tem2<-Tem1[1:20,]
>>
>>Tem1[!Tem1[,2]%in%Tem2[,2],]
>> #      V1 V2
>> #[1,] 222 21
>> #[2,] 222 22
>> #[3,] 222 23
>> #[4,] 222 24
>> #[5,] 222 25
>> #[6,] 333 26
>> #[7,] 333 27
>> #[8,] 333 28
>> #[9,] 333 29
>>#[10,] 333 30
>>
>>#or
>>subset(Tem1,!V2%in% Tem2[,2])
>>#or
>> Tem1[is.na(match(Tem1[,2],Tem2[,2])),]
>> #      V1 V2
>> #[1,] 222 21
>> #[2,] 222 22
>> #[3,] 222 23
>> #[4,] 222 24
>> #[5,] 222 25
>> #[6,] 333 26
>> #[7,] 333 27
>> #[8,] 333 28
>> #[9,] 333 29
>>#[10,] 333 30
>>A.K.
>>
>>
>>
>>
>>________________________________
>>From: HJ YAN <yhj204 at googlemail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Wednesday, March 6, 2013 10:33 AM
>>Subject: Re: [R] How to combine conditional argument and logical argument in R to create subset of data...
>>
>>
>>Thank you SO MUCH Arun!!! 
>>
>>That's brilliant-- I've learnt some very useful new R command now, e.g. 'do.call' and 'split'. And I see where my code went wrong now. 
>>
>> I do appreciate greatly for your prompt reply.
>>
>>Also, I wonder if there exist a package can find difference between two data frames, e.g. one is a subset of the other? e.g. 
>>
>> V1<-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)
>> V2<-c(1:23)
>>Tem1<-cbind(V1,V2)
>>
>>Tem2<-Tem1[1:20,]
>>
>>
>>How do I get outcome like 
>>
>>[21,] 333 21
>>[22,] 333 22
>>[23,] 333 23
>>
>>
>>P.S. I used 'setdiff' before, but seems it only works for vectors but not for dataframe??
>>
>>
>>Sorry for so many questions today, as I'm coding for a work deadline tonight.
>>
>>
>>Many thanks!
>>Cheers
>>HJ
>>
>>
>>
>>
>>
>>
>>
>>On Wed, Mar 6, 2013 at 1:55 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>You can also try this:
>>> Tem3<- list()
>>> for(i in unique(Tem1[,1])) {
>>> Tem3[[i]]<- subset(Tem1,Tem1[,1]==i)
>>> Tem4<- do.call(rbind,Tem3)
>>> }
>>>head(Tem4)
>>>#      V1 V2
>>>#[1,] 111  1
>>>#[2,] 111  2
>>>#[3,] 111  3
>>>#[4,] 111  4
>>>#[5,] 111 13
>>>#[6,] 111 14
>>>
>>>
>>>#or
>>>Tem3<-c(NA,NA)
>>> for(i in unique(Tem1[,1])) {
>>> Tem2<- subset(Tem1, Tem1[,1]==i)
>>> Tem3<- rbind(Tem3,Tem2)
>>> Tem5<- Tem3[-1,]
>>> }
>>>head(Tem5)
>>>#  V1 V2
>>># 111  1
>>># 111  2
>>># 111  3
>>># 111  4
>>># 111 13
>>># 111 14
>>>
>>>A.K.
>>>
>>>
>>>________________________________
>>>From: HJ YAN <yhj204 at googlemail.com>
>>>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: r-help at r-project.org
>>>Sent: Wednesday, March 6, 2013 8:24 AM
>>>Subject: Re: [R] How to combine conditional argument and logical argument in R to create subset of data...
>>>
>>>
>>>
>>>Hi Arun
>>>
>>>
>>>Thank you so much for the help, that's really helpful!!
>>>
>>>Also I have a quick question about the code below where I can not see why it doesn't work...
>>>
>>>I know the I shou
>>>
>>>V1<-c(rep(111,4),rep(222,4),rep(333,4),rep(111,4),rep(222,4),rep(333,3))
>>>V2<-c(1:23)
>>>Tem1<-cbind(V1,V2)
>>>
>>>
>>>So Tem 1 looks like...
>>>> Tem1
>>>       V1 V2
>>> [1,] 111  1
>>> [2,] 111  2
>>> [3,] 111  3
>>> [4,] 111  4
>>> [5,] 222  5
>>> [6,] 222  6
>>> [7,] 222  7
>>> [8,] 222  8
>>> [9,] 333  9
>>>[10,] 333 10
>>>[11,] 333 11
>>>[12,] 333 12
>>>[13,] 111 13
>>>[14,] 111 14
>>>[15,] 111 15
>>>[16,] 111 16
>>>[17,] 222 17
>>>[18,] 222 18
>>>[19,] 222 19
>>>[20,] 222 20
>>>[21,] 333 21
>>>[22,] 333 22
>>>[23,] 333 23
>>>
>>>I would like the outcome to be...
>>>
>>>      V1 V2
>>>
>>>     111  1
>>>     111  2
>>>     111  3
>>>     111  4
>>>     111 13
>>>     111 14
>>>     111 15
>>>     111 16
>>>     222  5
>>>     222  6
>>>     222  7
>>>     222  8
>>>     222 17
>>>     222 18
>>>     222 19
>>>     222 20
>>>     333  9
>>>     333 10
>>>     333 11
>>>     333 12
>>>     333 21
>>>     333 22
>>>     333 23
>>>
>>>
>>>So I tried code as below 
>>>------------------------------------------
>>>Tem3<-c(NA,NA)
>>>for(i in length(unique(Tem1[,1]))){
>>>Tem2<-subset(Tem1,Tem1[,1]==unique(Tem1[,1])[i])
>>>Tem3<-rbind(Tem3,Tem2)
>>>Tem3
>>>}
>>>Tem4<-Tem3[-1,]
>>>---------------------------------------
>>>
>>>And only get this...
>>>
>>>
>>> V1 V2
>>> 333  9
>>> 333 10
>>> 333 11
>>> 333 12
>>> 333 21
>>> 333 22
>>> 333 23
>>>
>>>
>>>I tried to run the code step by step, e.g. letting i=1, then i=2, then i= 3, and updating my Tem3, I did get what I wanted, but wondered why in the loop above it did not work...??
>>>
>>>
>>>Many thanks in advance!
>>>
>>>HJ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Wed, Mar 6, 2013 at 4:36 AM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>Hi,
>>>>
>>>> b[b[,4]>15 & (b[,1]>4|is.na(b[,1])) & (b[,2]>4|is.na(b[,2])),]
>>>> #    [,1] [,2] [,3] [,4] [,5]
>>>>#[1,]    6   NA   NA   16   20
>>>>#[2,]   NA    5   NA   17   21
>>>>A.K.
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: HJ YAN <yhj204 at googlemail.com>
>>>>To: r-help at r-project.org
>>>>Cc:
>>>>Sent: Tuesday, March 5, 2013 9:33 PM
>>>>Subject: [R] How to combine conditional argument and logical argument in R to create subset of data...
>>>>
>>>>Dear R user
>>>>
>>>>I have data created using code below
>>>>
>>>>b<-matrix(2:21,nrow=4)
>>>>b[,1:3]=NA
>>>>b[4,2]=5
>>>>b[3,1]=6
>>>>
>>>>Now the data is
>>>>
>>>>> b
>>>>         [,1]  [,2]   [,3]  [,4]  [,5]
>>>>[1,]   NA   NA   NA   14   18
>>>>[2,]   NA   NA   NA   15   19
>>>>[3,]      6   NA   NA   16   20
>>>>[4,]   NA    5     NA    17   21
>>>>
>>>>
>>>>I want to keep data in column 4 greater than 15 and the value in column 1 &
>>>>2 either greater than 4 or is 'NA'. So I would like to have
>>>>my outcome as below...
>>>>
>>>>[3,]   6   NA NA 16 20
>>>>[4,] NA 5 NA 17 21
>>>>
>>>>I thought something like the code below gonna to work but it only returns
>>>>the last row,e.g "NA 5 NA 17 21". ...
>>>>
>>>>bb<-b[which( (b[,2]>4 | b[,2]==NA) & (b[,1]>4 | b[,1]==NA) & b[,4]>15) ,])
>>>>
>>>>
>>>>Please could anyone help?
>>>>
>>>>Many thanks in advance
>>>>
>>>>HJ
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>     
>>>
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 



More information about the R-help mailing list