[R] adding an additional column for preserving uniqueness

William Dunlap wdunlap at tibco.com
Thu Jan 29 02:38:37 CET 2015


> with(dat1, ave(integer(length(Date)), Date, FUN=seq_along))
 [1] 1 1 2 1 1 2 1 2 1 2 1


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jan 28, 2015 at 4:54 PM, Morway, Eric <emorway at usgs.gov> wrote:

> The two datasets below are excerpts from much larger datasets.  Note that
> there are duplicate dates in both dat1 and dat2, e.g., "2009-10-14".
>
> dat1 <- read.table(textConnection("Date ConcAve
> 2009-07-08       7
> 2009-08-26       1
> 2009-08-26       2
> 2009-09-15       2
> 2009-10-14       2
> 2009-10-14       2
> 2009-10-16     101
> 2009-10-16      93
> 2009-11-18       4
> 2009-11-18       3
> 2010-01-04       4"),header=T)
> closeAllConnections()
>
> dat2 <- read.table(textConnection("Date  ConcAve
> 2009-08-26 4.84e-05
> 2009-09-15 4.58e-05
> 2009-10-14 3.86e-05
> 2009-10-14 3.55e-05
> 2009-10-16 3.07e-05
> 2009-10-16 2.35e-05
> 2009-11-18 2.00e-05
> 2009-11-18 1.96e-05
> 2010-01-04 1.52e-05
> 2010-01-04 1.53e-05
> 2010-02-10 2.23e-05"),header=T)
> closeAllConnections()
>
> I'm seeking an R operation that will append a third column to both
> data.frame's such that it makes these duplicates unique when I run merge().
> The desired result for dat1 would be:
>
>       Date ConcAve  item
> 2009-07-08       7     1
> 2009-08-26       1     1
> 2009-08-26       2     2
> 2009-09-15       2     1
> 2009-10-14       2     1
> 2009-10-14       2     2
> 2009-10-16     101     1
> 2009-10-16      93     2
> 2009-11-18       4     1
> 2009-11-18       3     2
> 2010-01-04       4     1
>
> this way, I don't get this:
>
> merge(dat1, dat2, by="Date")
> #         Date ConcAve.x ConcAve.y
> #1  2009-08-26         1  4.84e-05
> #2  2009-08-26         2  4.84e-05
> #3  2009-09-15         2  4.58e-05
> #4  2009-10-14         2  3.55e-05
> #5  2009-10-14         2  3.86e-05
> #6  2009-10-14         2  3.55e-05
> #7  2009-10-14         2  3.86e-05
> #8  2009-10-16       101  3.07e-05
> #9  2009-10-16       101  2.35e-05
> #10 2009-10-16        93  3.07e-05
> #11 2009-10-16        93  2.35e-05
> #12 2009-11-18         4  1.96e-05
> #13 2009-11-18         4  2.00e-05
> #14 2009-11-18         3  1.96e-05
> #15 2009-11-18         3  2.00e-05
> #16 2010-01-04         4  1.52e-05
> #17 2010-01-04         4  1.53e-05
>
> With the new column, which I've inserted manually in this small example, I
> instead get the merge result below, which is what I'm after for the larger
> problem:
>
> dat3 <- read.table(textConnection("Date ConcAve  item
> 2009-07-08       7     1
> 2009-08-26       1     1
> 2009-08-26       2     2
> 2009-09-15       2     1
> 2009-10-14       2     1
> 2009-10-14       2     2
> 2009-10-16     101     1
> 2009-10-16      93     2
> 2009-11-18       4     1
> 2009-11-18       3     2
> 2010-01-04       4     1"),header=T)
> closeAllConnections()
>
> dat4 <- read.table(textConnection("Date  ConcAve  item
> 2009-08-26 4.84e-05     1
> 2009-09-15 4.58e-05     1
> 2009-10-14 3.86e-05     1
> 2009-10-14 3.55e-05     2
> 2009-10-16 3.07e-05     1
> 2009-10-16 2.35e-05     2
> 2009-11-18 2.00e-05     1
> 2009-11-18 1.96e-05     2
> 2010-01-04 1.52e-05     1
> 2010-01-04 1.53e-05     2
> 2010-02-10 2.23e-05     1"),header=T)
> closeAllConnections()
>
> merge(dat3, dat4, by=c("Date","item"))
> #        Date item ConcAve.x ConcAve.y
> #1 2009-08-26    1         1  4.84e-05
> #2 2009-09-15    1         2  4.58e-05
> #3 2009-10-14    1         2  3.86e-05
> #4 2009-10-14    2         2  3.55e-05
> #5 2009-10-16    1       101  3.07e-05
> #6 2009-10-16    2        93  2.35e-05
> #7 2009-11-18    1         4  2.00e-05
> #8 2009-11-18    2         3  1.96e-05
> #9 2010-01-04    1         4  1.52e-05
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list