[R] 'merge' function creating duplicate columns names in the output
jim holtman
jholtman at gmail.com
Thu Mar 3 23:04:52 CET 2011
The "merge" command is creating duplicate column names in a dataframe
that is the result of the merge. The following is the 'merge'
command:
x <- merge(invType
, allocSlots
, by.x = 'index'
, by.y = 'indx'
, all.x = TRUE
)
The 'invType' dataframe was the result of a previous merge and has the
following column names that are probably causing the problem:
height.x
height.y
height
> str(invType)
'data.frame': 2219 obs. of 30 variables:
$ loc : chr "F0AA63" "F0AA65" "F0AA73" "F0AA75" ...
$ KLN : int 3569383 3515513 3565497 3555138 3565162 3555001
3565139 3555886 3565796 3556647 ...
$ comm : int 451 57 560 40 560 39 560 40 560 46 ...
$ case : num 7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ...
$ desc : chr "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH
FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ...
$ height.x: num 7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ...
$ length : num 14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ...
$ weight : num 11 16.3 39 11 35.6 6.5 36 4 30 12.5 ...
$ width : num 9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ...
$ high : int 5 2 3 3 4 3 3 3 3 3 ...
$ pqty : int 65 26 18 45 20 45 21 39 24 30 ...
$ boh : int 4372 58 1199 51 836 116 64 312 371 389 ...
$ awm : num 694 44.3 53.8 35 0.8 ...
$ cubes : num 0.586 1.06 1.821 0.563 1.328 ...
$ pallet : num 42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ...
$ adm : num 99.143 6.329 7.686 5 0.114 ...
$ tie : num 13 13 6 15 5 15 7 13 8 10 ...
$ origComm: int 457 57 547 40 541 39 552 40 552 46 ...
$ days : num 0.656 6.162 2.342 11.998 216.853 ...
$ class : chr "single" "double" "single" "double" ...
$ top.x : logi TRUE TRUE FALSE TRUE FALSE TRUE ...
$ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8
23 15 23 16 ...
$ type.x : int 2 2 2 2 2 2 2 2 2 2 ...
$ height.y: num 47 47 47 47 47 47 47 47 47 47 ...
$ top.y : logi FALSE TRUE FALSE TRUE FALSE TRUE ...
$ noChange: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ type.y : int 2 2 2 2 2 2 2 2 2 2 ...
$ depth : num 48 48 48 48 48 48 48 48 48 48 ...
$ height : num 47 47 47 47 47 47 47 47 47 47 ...
$ index : int 1 2 3 4 5 6 7 8 9 10 ...
Now the "allocSlots" dataframe also has a column name 'height'
> str(allocSlots)
'data.frame': 2462 obs. of 6 variables:
$ loc : chr "F1AA02" "F1AA12" "F1AA22" "F1AA32" ...
$ height: num 72 72 72 72 72 72 72 72 72 72 ...
$ depth : num 48 48 48 48 48 48 48 48 48 48 ...
$ bay : chr "F1AA0" "F1AA0" "F1AA2" "F1AA2" ...
$ indx : int 1675 1617 1386 1096 1077 963 816 471 275 259 ...
$ type : int 1 1 1 1 1 1 1 1 1 1 ...
Here is the result of the 'merge': (notice that there are now two
'height.x' and 'height.y' columns in the dataframe:
> str(x)
'data.frame': 2219 obs. of 35 variables:
$ index : int 1 2 3 4 5 6 7 8 9 10 ...
$ loc.x : chr "F0AA63" "F0AA65" "F0AA73" "F0AA75" ...
$ KLN : int 3569383 3515513 3565497 3555138 3565162 3555001
3565139 3555886 3565796 3556647 ...
$ comm : int 451 57 560 40 560 39 560 40 560 46 ...
$ case : num 7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ...
$ desc : chr "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH
FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ...
$ height.x: num 7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ...
$ length : num 14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ...
$ weight : num 11 16.3 39 11 35.6 6.5 36 4 30 12.5 ...
$ width : num 9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ...
$ high : int 5 2 3 3 4 3 3 3 3 3 ...
$ pqty : int 65 26 18 45 20 45 21 39 24 30 ...
$ boh : int 4372 58 1199 51 836 116 64 312 371 389 ...
$ awm : num 694 44.3 53.8 35 0.8 ...
$ cubes : num 0.586 1.06 1.821 0.563 1.328 ...
$ pallet : num 42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ...
$ adm : num 99.143 6.329 7.686 5 0.114 ...
$ tie : num 13 13 6 15 5 15 7 13 8 10 ...
$ origComm: int 457 57 547 40 541 39 552 40 552 46 ...
$ days : num 0.656 6.162 2.342 11.998 216.853 ...
$ class : chr "single" "double" "single" "double" ...
$ top.x : logi TRUE TRUE FALSE TRUE FALSE TRUE ...
$ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8
23 15 23 16 ...
$ type.x : int 2 2 2 2 2 2 2 2 2 2 ...
$ height.y: num 47 47 47 47 47 47 47 47 47 47 ...
$ top.y : logi FALSE TRUE FALSE TRUE FALSE TRUE ...
$ noChange: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ type.y : int 2 2 2 2 2 2 2 2 2 2 ...
$ depth.x : num 48 48 48 48 48 48 48 48 48 48 ...
$ height.x: num 47 47 47 47 47 47 47 47 47 47 ...
$ loc.y : chr "F1KC22" "F1BM34" "F1HC73" "F1FJ65" ...
$ height.y: num 72 44 72 44 72 44 72 44 72 72 ...
$ depth.y : num 48 48 48 48 48 48 48 48 48 48 ...
$ bay : chr "F1KC2" "F1BM2" "F1HC7" "F1FJ5" ...
$ type : int 1 2 1 2 1 2 1 2 1 1 ...
My workaround is to change one of the "height" to something else to
avoid the problem, but someone else might stumble on the same error.
Should we expect 'merge' to ensure that the column names are unique in
the result?
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list