[R] create unique ID for each group

William Dunlap wdunlap at tibco.com
Tue May 7 21:34:51 CEST 2013


>I want to merge dat1 and dat2 based on "ID" in order, I know "match" only
>returns the first match it finds. So I am thinking create unique ID col in
>dat2 and dat2, then merge.

You can make a new within-group sequence number with ave():

> dat1<- read.table(text="
ObsNumber     ID          Weight
     1                 0001         12
     2                 0001          13
     3                 0001           14
     4                  0002         16
      5                 0002         17
     6                   N/A          18   
",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")

> dat1$withinIDSeq <- ave(rep(NA_real_,nrow(dat1)), dat1$ID, FUN=seq_along)
> dat1$withinIDSeq
[1]  1  2  3  1  2 NA

Use merge() with the 'by' columns being ID and withinIDSeq or paste them together
yourself and use the result as your single 'by' column.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Ye Lin
> Sent: Tuesday, May 07, 2013 11:38 AM
> To: Chris Stubben
> Cc: R help
> Subject: Re: [R] create unique ID for each group
> 
> In each category, the order is the same. Fro example, the first match in
> dat2 should return to the first record in dat2
> 
> 
> On Tue, May 7, 2013 at 11:31 AM, Chris Stubben <stubben at lanl.gov> wrote:
> 
> > Yes, I tried, but the order of the IDs in dat1 and dat2 is not exactly the
> >> same, I simplify the data here. So in dat2, it may have records for
> >> ID=0002
> >> first then ID=0001, also I have more than two categories under ID col
> >>
> >
> > I should have looked at the question more closely, sorry.   Unique ids in
> > raw datasets
> > are pretty important, especially if observations are split into different
> > files and you are trying to join them later.  How do you know for ID 0001
> > and obs 1 that height is  3.2 and not 2.6, especially if order in the two
> > files are "not exactly the same".
> >
> >
> > Chris
> >
> >
> >
> > --
> >
> > Chris Stubben
> >
> > Los Alamos National Lab
> > Bioscience Division
> > MS M888
> > Los Alamos, NM 87545
> >
> > ______________________________**________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-
> help>
> > PLEASE do read the posting guide http://www.R-project.org/**
> > posting-guide.html <http://www.R-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list