[R] create unique ID for each group
arun
smartpink111 at yahoo.com
Tue May 7 22:25:57 CEST 2013
Hi,
Try this:
dat1<- read.table(text="
ObsNumber ID Weight
1 0001 12
2 0001 13
3 0001 14
4 0002 16
5 0002 17
6 N/A 18
7 0003 19
8 N/A 20
9 0003 21
",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
dat2<- read.table(text="
ID Height
0001 3.2
0001 2.6
0001 3.2
0002 2.2
0002 2.6
",sep="",header=TRUE,colClass=c("character","numeric"))
dat1[!is.na(dat1$ID),"UniqueID"]<-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
library(plyr)
join(dat1,dat2,by="UniqueID",type="left")
# ObsNumber ID Weight UniqueID ID Height
#1 1 0001 12 0001_1 0001 3.2
#2 2 0001 13 0001_2 0001 2.6
#3 3 0001 14 0001_3 0001 3.2
#4 4 0002 16 0002_1 0002 2.2
#5 5 0002 17 0002_2 0002 2.6
#6 6 <NA> 18 <NA> <NA> NA
#7 7 0003 19 0003_1 <NA> NA
#8 8 <NA> 20 <NA> <NA> NA
#9 9 0003 21 0003_2 <NA> NA
A.K.
________________________________
From: Ye Lin <yelin at lbl.gov>
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, May 7, 2013 4:05 PM
Subject: Re: [R] create unique ID for each group
Yes, I need to keep the N/A records.
On Tue, May 7, 2013 at 1:00 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi
>Do you need an output like this?
> merge(dat1,dat2,by="UniqueID",all.x=TRUE)
> UniqueID ObsNumber ID.x Weight ID.y Height
>1 0001_1 1 0001 12 0001 3.2
>2 0001_2 2 0001 13 0001 2.6
>3 0001_3 3 0001 14 0001 3.2
>4 0002_1 4 0002 16 0002 2.2
>5 0002_2 5 0002 17 0002 2.6
>6 <NA> 6 <NA> 18 <NA> NA
>
>when you use:
>
>
> dat1
> ObsNumber ID Weight UniqueID
>1 1 0001 12 0001_1
>2 2 0001 13 0001_2
>3 3 0001 14 0001_3
>4 4 0002 16 0002_1
>5 5 0002 17 0002_2
>6 6 <NA> 18 <NA>
>
> dat2
> ID Height UniqueID
>1 0001 3.2 0001_1
>2 0001 2.6 0001_2
>3 0001 3.2 0001_3
>4 0002 2.2 0002_1
>5 0002 2.6 0002_2
>
>
>
>
>________________________________
>From: Ye Lin <yelin at lbl.gov>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, May 7, 2013 3:41 PM
>
>Subject: Re: [R] create unique ID for each group
>
>
>
>If the ID="N/A" then when merge, there would be any match and can return N/A
>
>I use merge(dat1, dat2, by="UniqueID", all.x=TRUE),then an extra row will be added to the output for each case in dat1 that has no matching cases in dat2
>
>I just have to leave the records in dat1 even ID=N/A
>
>
>
>
>On Tue, May 7, 2013 at 12:38 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Also, another problem might be where do you assign those rows with missing ID. It could be the missing value for any ID.
>>For example in this case:
>>
>>dat1<- read.table(text="
>> ObsNumber ID Weight
>> 1 0001 12
>> 2 0001 13
>> 3 0001 14
>> 4 0002 16
>> 5 0002 17
>> 6 N/A 18
>> 7 0003 19
>> 8 0003 20
>>
>> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
>> dat1
>> ObsNumber ID Weight
>>1 1 0001 12
>>2 2 0001 13
>>3 3 0001 14
>>4 4 0002 16
>>5 5 0002 17
>>6 6 <NA> 18
>>7 7 0003 19
>>8 8 0003 20
>>
>>
>> The missing ID could be either "0002" or "0003".
>>
>>
>>
>>
>>
>>
>>----- Original Message -----
>>From: arun <smartpink111 at yahoo.com>
>>To: Ye Lin <yelin at lbl.gov>
>>Cc:
>>
>>Sent: Tuesday, May 7, 2013 3:32 PM
>>Subject: Re: [R] create unique ID for each group
>>
>>If you modify with na.strings="N/A", IDs with missing values will be read correctly. Otherwise, it is just a character string. BTW, if you need rows with NAs, then what will be your UniqueIDs you expect for those rows?
>>
>>
>>
>>
>>
>>
>>
>>________________________________
>>From: Ye Lin <yelin at lbl.gov>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Tuesday, May 7, 2013 3:25 PM
>>Subject: Re: [R] create unique ID for each group
>>
>>
>>
>>I do need rows with "NA". I already read the data in R, so do I assume I need to modify dat1 first with na.strings="N/A" ?
>>
>>
>>
>>On Tue, May 7, 2013 at 12:16 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>Do you need that row with "N/A". The code I sent will remove that row. If you don't use "na.strings="N/A", then it is not read NA, but some other character. That is the reason you got results like:
>>>
>>> unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>#[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2" "N/A_1"
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: arun <smartpink111 at yahoo.com>
>>>To: Ye Lin <yelin at lbl.gov>
>>>Cc: R help <r-help at r-project.org>
>>>
>>>Sent: Tuesday, May 7, 2013 3:13 PM
>>>Subject: Re: [R] create unique ID for each group
>>>
>>>HI Ye,
>>>
>>>For the NA in ID column,
>>>
>>>
>>>
>>>Hi
>>>dat1<- read.table(text="
>>>ObsNumber ID Weight
>>> 1 0001 12
>>> 2 0001 13
>>> 3 0001 14
>>> 4 0002 16
>>> 5 0002 17
>>> 6 N/A 18
>>>",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
>>> unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>#[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
>>>A.K.
>>>________________________________
>>>From: Ye Lin <yelin at lbl.gov>
>>>To: arun <smartpink111 at yahoo.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Tuesday, May 7, 2013 2:54 PM
>>>Subject: Re: [R] create unique ID for each group
>>>
>>>
>>>
>>>Thanks A.K. But I have "NA" in ID column, so when I apply the code, it gives me error saying the replacement as less rows than the data has. Anyway for ID=N/A, return sth like "N/A_1" in order as well?
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Tue, May 7, 2013 at 11:17 AM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>H,
>>>>Sorry, a mistake:
>>>>dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>>dat1
>>>> # ObsNumber ID Weight UniqueID
>>>>#1 1 0001 12 0001_1
>>>>#2 2 0001 13 0001_2
>>>>#3 3 0001 14 0001_3
>>>>#4 4 0002 16 0002_1
>>>>#5 5 0002 17 0002_2
>>>>
>>>>dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x) with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
>>>>
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>
>>>>From: arun <smartpink111 at yahoo.com>
>>>>To: Ye Lin <yelin at lbl.gov>
>>>>Cc: R help <r-help at r-project.org>
>>>>Sent: Tuesday, May 7, 2013 2:10 PM
>>>>Subject: Re: [R] create unique ID for each group
>>>>
>>>>
>>>>
>>>>Hi,
>>>>
>>>>Try this:
>>>>dat1<- read.table(text="
>>>>ObsNumber ID Weight
>>>> 1 0001 12
>>>> 2 0001 13
>>>> 3 0001 14
>>>> 4 0002 16
>>>> 5 0002 17
>>>>",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
>>>>dat2<- read.table(text="
>>>>ID Height
>>>>0001 3.2
>>>>0001 2.6
>>>>0001 3.2
>>>>0002 2.2
>>>>0002 2.6
>>>>",sep="",header=TRUE,colClass=c("character","numeric"))
>>>>dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
>>>> dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
>>>> dat2
>>>># ID Height UniqueID
>>>>#1 0001 3.2 0001_1
>>>>#2 0001 2.6 0001_2
>>>>#3 0001 3.2 0001_3
>>>>#4 0002 2.2 0002_4
>>>>#5 0002 2.6 0002_5
>>>>A.K.
>>>>
>>>>
>>>>
>>>>----- Original Message -----
>>>>From: Ye Lin <yelin at lbl.gov>
>>>>To: R help <r-help at r-project.org>
>>>>Cc:
>>>>Sent: Tuesday, May 7, 2013 1:54 PM
>>>>Subject: [R] create unique ID for each group
>>>>
>>>>Hey All,
>>>>
>>>>I have a dataset(dat1) like this:
>>>>
>>>>ObsNumber ID Weight
>>>> 1 0001 12
>>>> 2 0001 13
>>>> 3 0001 14
>>>> 4 0002 16
>>>> 5 0002 17
>>>>
>>>>And another dataset(dat2) like this:
>>>>
>>>>ID Height
>>>>0001 3.2
>>>>0001 2.6
>>>>0001 3.2
>>>>0002 2.2
>>>>0002 2.6
>>>>
>>>>I want to merge dat1 and dat2 based on "ID" in order, I know "match" only
>>>>returns the first match it finds. So I am thinking create unique ID col in
>>>>dat2 and dat2, then merge. But I dont know how to do that so it can be like
>>>>this:
>>>>
>>>>dat1:
>>>>
>>>>ObsNumber ID Weight UniqueID
>>>> 1 0001 12 0001_1
>>>> 2 0001 13 0001_2
>>>> 3 0001 14 0001_3
>>>> 4 0002 16 0002_1
>>>> 5 0002 17 0002_1
>>>>
>>>>dat2:
>>>>
>>>>ID Height UniqueID
>>>>0001 3.2 0001_1
>>>>0001 2.6 0001_2
>>>>0001 3.2 0001_3
>>>>0002 2.2 0002_1
>>>>0002 2.6 0002_2
>>>>
>>>>Or if it is possible to merge dat1 and dat2 by matching "ID" but return the
>>>>match in order that would be great!
>>>>
>>>>Thanks for your help!
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>
More information about the R-help
mailing list