[R] Difficulty with 'merge'
Christoph Buser
buser at stat.math.ethz.ch
Thu Jan 5 09:45:28 CET 2006
Dear Michael
Please remark that merge calculates all possible combinations if
you have repeated elements as you can see in the example below.
?merge
"... If there is more than one match, all possible matches
contribute one row each. ..."
Maybe you can apply "aggregate" in a reasonable way on your
data.frame first to summarize your repeated values to unique
ones and the proceed with merge, but that depends on your
problem.
Regards,
Christoph
--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------
example with repeated values
----------------------------
v1 <- c("a", "b", "a", "b", "a")
n1 <- 1:5
v2 <- c("b", "b", "a", "a", "a")
n2 <- 6:10
(f1 <- data.frame(v1, n1))
(f2 <- data.frame(v2, n2))
(m12 <- merge(f1, f2, by.x = "v1", by.y = "v2", sort = F))
Michael Kubovy writes:
> Dear R-helpers,
>
> Happy New Year to all the helpful members of the list.
>
> Here is the behavior I'm looking for:
> > v1 <- c("a","b","c")
> > n1 <- c(0, 1, 2)
> > v2 <- c("c", "a", "b")
> > n2 <- c(0, 1 , 2)
> > (f1 <- data.frame(v1, n1))
> v1 n1
> 1 a 0
> 2 b 1
> 3 c 2
> > (f2 <- data.frame(v2, n2))
> v2 n2
> 1 c 0
> 2 a 1
> 3 b 2
> > (m12 <- merge(f1, f2, by.x = "v1", by.y = "v2", sort = F))
> v1 n1 n2
> 1 c 2 0
> 2 a 0 1
> 3 b 1 2
>
> Now to my data:
> > summary(pL)
> pairL
> a fondo : 41
> alto : 41
> ampio : 41
> angoloso : 41
> aperto : 41
> appoggiato: 41
> (Other) :1271
>
> > pL$pairL[c(1,42)]
> [1] appoggiato dentro
> 37 Levels: a fondo alto ampio angoloso aperto appoggiato asimmetrico
> complicato convesso davanti dentro destra ... verticale
>
> > summary(oppN)
> pairL pairR subject
> L LL RR M
> a fondo : 41 a galla : 41 S1 : 37 Min. :0.3646
> Min. :0.02083 Min. :0.0010 Min. :0.0000
> alto : 41 acuto : 41 S10 : 37 1st Qu.:0.5521
> 1st Qu.:0.37500 1st Qu.:0.1771 1st Qu.:0.1042
> ampio : 41 arrotondato: 41 S11 : 37 Median :0.6354
> Median :0.47917 Median :0.2708 Median :0.2292
> angoloso : 41 basso : 41 S12 : 37 Mean :0.6403
> Mean :0.46452 Mean :0.2760 Mean :0.2598
> aperto : 41 chiuso : 41 S13 : 37 3rd Qu.:0.7188
> 3rd Qu.:0.55208 3rd Qu.:0.3750 3rd Qu.:0.3854
> appoggiato: 41 compl : 41 S14 : 37 Max. :0.9375
> Max. :0.92708 Max. :0.6042 Max. :0.7812
> (Other) :1271 (Other) :1271 (Other):
> 1295 NA's :3.0000 NA's :
> 3.0000
> asym polar polar_a1 clust
> Min. :-0.5555 Min. :-1.2410 Min. :-2.949e+00 c1:492
> 1st Qu.: 0.2091 1st Qu.: 0.4571 1st Qu.:-1.902e-01 c2:287
> Median : 0.5555 Median : 1.1832 Median :-1.110e-16 c3: 82
> Mean : 0.6265 Mean : 1.3428 Mean :-5.745e-02 c4:246
> 3rd Qu.: 0.9383 3rd Qu.: 2.0712 3rd Qu.: 1.168e-01 c5: 82
> Max. : 2.7081 Max. : 4.6151 Max. : 4.218e+00 c6:328
> NA's : 3.0000 NA's : 3.000e+00
>
> > oppN$pairL[c(1,42)]
> [1] spesso fine
> 37 Levels: a fondo alto ampio angoloso aperto appoggiato asimmetrico
> complicato convesso davanti dentro destra ... verticale
>
> > unique(sort(oppM$pairL)) == unique(sort(pL$pairL))
> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [26] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
>
> In other words I think that pL$pairL and oppN$pairL consists of 37
> blocks of 41 repetitions of names, and that these blocks are
> permutations of each other,
>
> However:
>
> > summary(m1 <- merge(oppM, pairL, by.x = "pairL", by.y = "pairL",
> sort = F))
> pairL pairR subject
> L LL RR M
> a fondo : 1681 a galla : 1681 S1 : 1517 Min. :
> 0.3646 Min. :0.02083 Min. :0.0010 Min. :0.0000
> alto : 1681 acuto : 1681 S10 : 1517 1st Qu.:
> 0.5521 1st Qu.:0.37500 1st Qu.:0.1771 1st Qu.:0.1042
> ampio : 1681 arrotondato: 1681 S11 : 1517 Median :
> 0.6354 Median :0.47917 Median :0.2708 Median :0.2292
> angoloso : 1681 basso : 1681 S12 : 1517 Mean :
> 0.6398 Mean :0.46402 Mean :0.2760 Mean :0.2598
> aperto : 1681 chiuso : 1681 S13 : 1517 3rd Qu.:
> 0.7188 3rd Qu.:0.55208 3rd Qu.:0.3750 3rd Qu.:0.3854
> appoggiato: 1681 compl : 1681 S14 : 1517 Max. :
> 0.9375 Max. :0.92708 Max. :0.6042 Max. :0.7812
> (Other) :51988 (Other) :51988 (Other):52972
> asym polar polar_a1 clust
> Min. :-0.5555 Min. :-1.2410 Min. :-2.949e+00 c1:20172
> 1st Qu.: 0.2091 1st Qu.: 0.4571 1st Qu.:-1.904e-01 c2:11644
> Median : 0.5555 Median : 1.1832 Median :-1.110e-16 c3: 3362
> Mean : 0.6234 Mean : 1.3428 Mean :-5.745e-02 c4:10086
> 3rd Qu.: 0.9383 3rd Qu.: 2.0712 3rd Qu.: 1.169e-01 c5: 3362
> Max. : 2.7081 Max. : 4.6151 Max. : 4.218e+00 c6:13448
>
> I was expecting pairL to be 41 items longs, not 1681 = 41^2.
> _____________________________
> Professor Michael Kubovy
> University of Virginia
> Department of Psychology
> USPS: P.O.Box 400400 Charlottesville, VA 22904-4400
> Parcels: Room 102 Gilmer Hall
> McCormick Road Charlottesville, VA 22903
> Office: B011 +1-434-982-4729
> Lab: B019 +1-434-982-4751
> Fax: +1-434-982-4766
> WWW: http://www.people.virginia.edu/~mk9y/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list