[R] How can I get the Ids with Duplicated key and corresponding Ids with original key?

David Winsemius dwinsemius at comcast.net
Mon Aug 13 17:48:19 CEST 2012


On Aug 13, 2012, at 4:07 AM, Sri krishna Devarayalu Balanagu wrote:

> Thank you for the quick response.
> But I want those duplicated with Ids in a separate vector like  
> Duplicated.ids in the below example?
> Duplication should be checked for Publication and Reference  
> combination, not on a single variable.
>
First case:
If you wanted just the ones that came _after_ the initial intances  
then this would apply:

Duplicated.ids<- df.key[duplicated(key), c("Id")]

The vector that comes back from duplicated will be the same length as  
the number of rows of df.key or of df for that matter. You could also  
have been able to skip the creation of key and just done this:

Duplicated.ids<- df[ duplicated( df[ , c("Publication",  
"Reference")] ), c("Id") ]

------------------
Second case:
If you wanted both the later instances _and_ the first instances, you  
could use this method offered by Bill Dunlap in these pages within the  
last week if memory serves.

Duplicated.ids<- df.key[ duplicated(key) |  duplicated(key,  
fromLast=TRUE), c("Id")]

The second condition with an OR connector will also bring in the first  
instances.

?duplicated for further detail and worked examples
-- 
David.

>
> Regards
> Rayalu
> -----Original Message-----
> From: Jim Lemon [mailto:jim at bitwrit.com.au]
> Sent: Monday, August 13, 2012 3:37 PM
> To: Sri krishna Devarayalu Balanagu
> Cc: r-help at r-project.org
> Subject: Re: [R] How can I get the Ids with Duplicated key and  
> corresponding Ids with original key?
>
> On 08/13/2012 07:17 PM, Sri krishna Devarayalu Balanagu wrote:
>>
>> In this following example Id 4 is duplicated with Id 1.
>> Like this I want both Ids (Duplicated and Duplicated with). Can  
>> anyone help?
>>
>> df<- data.frame(
>>     "Publication" = c(1, 2, 3, 1, 4, 5, 2, 3),
>>     "Reference"   = c("a", "b", "c", "a", "d", "e", "b", "c"),
>>     "Id"= c(1, 2, 3, 4, 5, 6, 7, 8)
>>                  )
>>
>> key<- paste(df$Publication, df$Reference, sep="_")
>> df.key<- cbind(key, df)
>> Duplicated.ids<- df.key[duplicated(df.key$key), c("Id")]
>>
> Hi Sri krishna Devarayalu Balanagu,
> Does this do it?
>
> cat("Id Publication(s)\n")
> for(pub in unique(df$Publication))
>  cat(pub,"-",df$Id[which(df$Publication==pub)],"\n")
>
> Jim
> ________________________________
> Notice: The information contained in this electronic mail message is  
> intended only for the use of the designated recipient. This message  
> is privileged and confidential. and the property of GVK BIO or its  
> affiliates and subsidiaries. If the reader of this message is not  
> the intended recipient or an agent responsible for delivering it to  
> the intended recipient, you are hereby notified that you have  
> received this message in error and that any review, dissemination,  
> distribution, or copying of this message is strictly prohibited. If  
> you have received this communication in error, please notify us  
> immediately by telephone +91-40-66929999<tel:%2B91-40-66929999> and  
> destroy any and all copies of this message in your possession  
> (whether hard copies or electronically stored copies).
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA



More information about the R-help mailing list