[R] adding an additional column for preserving uniqueness

Morway, Eric emorway at usgs.gov
Thu Jan 29 01:54:20 CET 2015


The two datasets below are excerpts from much larger datasets.  Note that
there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14".

dat1 <- read.table(textConnection("Date ConcAve
2009-07-08       7
2009-08-26       1
2009-08-26       2
2009-09-15       2
2009-10-14       2
2009-10-14       2
2009-10-16     101
2009-10-16      93
2009-11-18       4
2009-11-18       3
2010-01-04       4"),header=T)
closeAllConnections()

dat2 <- read.table(textConnection("Date  ConcAve
2009-08-26 4.84e-05
2009-09-15 4.58e-05
2009-10-14 3.86e-05
2009-10-14 3.55e-05
2009-10-16 3.07e-05
2009-10-16 2.35e-05
2009-11-18 2.00e-05
2009-11-18 1.96e-05
2010-01-04 1.52e-05
2010-01-04 1.53e-05
2010-02-10 2.23e-05"),header=T)
closeAllConnections()

I'm seeking an R operation that will append a third column to both
data.frame's such that it makes these duplicates unique when I run merge().
The desired result for dat1 would be:

      Date ConcAve  item
2009-07-08       7     1
2009-08-26       1     1
2009-08-26       2     2
2009-09-15       2     1
2009-10-14       2     1
2009-10-14       2     2
2009-10-16     101     1
2009-10-16      93     2
2009-11-18       4     1
2009-11-18       3     2
2010-01-04       4     1

this way, I don't get this:

merge(dat1, dat2, by="Date")
#         Date ConcAve.x ConcAve.y
#1  2009-08-26         1  4.84e-05
#2  2009-08-26         2  4.84e-05
#3  2009-09-15         2  4.58e-05
#4  2009-10-14         2  3.55e-05
#5  2009-10-14         2  3.86e-05
#6  2009-10-14         2  3.55e-05
#7  2009-10-14         2  3.86e-05
#8  2009-10-16       101  3.07e-05
#9  2009-10-16       101  2.35e-05
#10 2009-10-16        93  3.07e-05
#11 2009-10-16        93  2.35e-05
#12 2009-11-18         4  1.96e-05
#13 2009-11-18         4  2.00e-05
#14 2009-11-18         3  1.96e-05
#15 2009-11-18         3  2.00e-05
#16 2010-01-04         4  1.52e-05
#17 2010-01-04         4  1.53e-05

With the new column, which I've inserted manually in this small example, I
instead get the merge result below, which is what I'm after for the larger
problem:

dat3 <- read.table(textConnection("Date ConcAve  item
2009-07-08       7     1
2009-08-26       1     1
2009-08-26       2     2
2009-09-15       2     1
2009-10-14       2     1
2009-10-14       2     2
2009-10-16     101     1
2009-10-16      93     2
2009-11-18       4     1
2009-11-18       3     2
2010-01-04       4     1"),header=T)
closeAllConnections()

dat4 <- read.table(textConnection("Date  ConcAve  item
2009-08-26 4.84e-05     1
2009-09-15 4.58e-05     1
2009-10-14 3.86e-05     1
2009-10-14 3.55e-05     2
2009-10-16 3.07e-05     1
2009-10-16 2.35e-05     2
2009-11-18 2.00e-05     1
2009-11-18 1.96e-05     2
2010-01-04 1.52e-05     1
2010-01-04 1.53e-05     2
2010-02-10 2.23e-05     1"),header=T)
closeAllConnections()

merge(dat3, dat4, by=c("Date","item"))
#        Date item ConcAve.x ConcAve.y
#1 2009-08-26    1         1  4.84e-05
#2 2009-09-15    1         2  4.58e-05
#3 2009-10-14    1         2  3.86e-05
#4 2009-10-14    2         2  3.55e-05
#5 2009-10-16    1       101  3.07e-05
#6 2009-10-16    2        93  2.35e-05
#7 2009-11-18    1         4  2.00e-05
#8 2009-11-18    2         3  1.96e-05
#9 2010-01-04    1         4  1.52e-05

	[[alternative HTML version deleted]]



More information about the R-help mailing list