[R] Merging Issue

jim holtman jholtman at gmail.com
Sun Jun 19 00:41:34 CEST 2016


Don't use HTML on sending email- messes up the data.

What do you mean that you get lots of duplicates?  If you have duplicated
entries in df2 this will lead to dups because of the way merge works (here
is the help file):

 If there is more than one match, all possible matches contribute
     one row each.  For the precise meaning of ‘match’, see ‘match’.

So you need to define the problem that you want to solve in going the
merge.  Here is what happens in your data if I duplicate some entries in
df2; is this what you are seeing:

>  #Data A
>  Subject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5", "5", "5")
>  dates<-seq(as.Date('2011-01-01'),as.Date('2011-01-12'),by = 1)
>  deps<-c("A", "B", "C", "C", "D", "A", "F", "G", "A", "F", "A", "D")
>  df <- data.frame(Subject, dates, deps)
>  ##
>  #Data B
>  loc<-c("CA","NY", "CA", "NY", "WA", "WA", 'yy')
>  grp<-c("DE", "OC", "DE", "OT", "DE", "OC", "xx")
>  deps<-c("A","B","C", "D", "F","G", "A")
>  df2<-data.frame(loc, grp, deps )
>  dat<-merge(df, df2, by="deps")
>
> dat
   deps Subject      dates loc grp
1     A       2 2011-01-01  CA  DE
2     A       2 2011-01-01  yy  xx
3     A       3 2011-01-06  CA  DE
4     A       3 2011-01-06  yy  xx
5     A       5 2011-01-11  CA  DE
6     A       5 2011-01-11  yy  xx
7     A       5 2011-01-09  CA  DE
8     A       5 2011-01-09  yy  xx
9     B       2 2011-01-02  NY  OC
10    C       3 2011-01-04  CA  DE
11    C       2 2011-01-03  CA  DE
12    D       5 2011-01-12  NY  OT
13    D       3 2011-01-05  NY  OT
14    F       5 2011-01-10  WA  DE
15    F       4 2011-01-07  WA  DE
16    G       4 2011-01-08  WA  OC



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jun 17, 2016 at 8:33 PM, Farnoosh Sheikhi via R-help <
r-help at r-project.org> wrote:

> Hi all,
> I have two data sets similar like below and wanted to merge them with
> variable "deps". As this is a sample data with small sample size, I don't
> have any problem using command merge. However, the actual data set has
> ~60,000 observations with a lot of repeated measures. For example, for a
> given ID I have 100 different dates and groups. Thee problem is using
> "merge" command gives me a lot of duplicates that I can't even track. I was
> wondering if there is any other way to merge such a data.Any help is
> appreciated. Thanks.
> ## Data ASubject<- c("2", "2", "2", "3", "3", "3", "4", "4", "5", "5",
> "5", "5")dates<-seq(as.Date('2011-01-01'),as.Date('2011-01-12'),by =
> 1) deps<-c("A", "B", "C", "C", "D", "A", "F", "G", "A", "F", "A", "D")df <-
> data.frame(Subject, dates, deps)
> ## Data Bloc<-c("CA","NY", "CA", "NY", "WA", "WA")grp<-c("DE", "OC", "DE",
> "OT", "DE", "OC")deps<-c("A","B","C", "D", "F","G")df2<-data.frame(loc,
> grp, deps )
> dat<-merge(df, df2, by="deps")
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list