[R] Sort problem in merge()

Jean Eid jeaneid at chass.utoronto.ca
Mon Mar 6 17:15:38 CET 2006


If believe that merge is trying to put first whatever cells that are 
nonempty. For example if you instead did

 >  tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F","A"), levs), 
col2 = 1:5)
 > tmp2
  col1 col2
1    C    1
2    D    2
3    E    3
4    F    4
5    A    5
 > merge(tmp2, tmp1, all.y = TRUE, sort = FALSE)
  col1 col2
1    A    5
2    A    5
3    C    1
4    C    1
5    0   NA
6    0   NA
 > tmp1
  col1
1    A
2    A
3    C
4    C
5    0
6    0
 >


and if you do this

 >  tmp1 <- data.frame(col1 = factor(c("0", "0", "C", "C", "A", "A"), levs))
 > merge(tmp2, tmp1, all.y = TRUE, sort = FALSE)
  col1 col2
1    C    1
2    C    1
3    A    5
4    A    5
5    0   NA
6    0   NA
 >

So I think it is doing what you want it to do.


Jean

Gregor Gorjanc wrote:

>Gabor Grothendieck wrote:
>  
>
>>If you make the levels the same does that give what you want:
>>
>>levs <- c(LETTERS[1:6], "0")
>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
>>merge(tmp2, tmp1, all = TRUE, sort = FALSE)
>>merge(tmp1, tmp2, all = TRUE, sort = FALSE)
>>    
>>
>
>Gabor thanks for this, but unfortunatelly the result is the same. I get
>the following via both ways - note that I use all.x or all.y = TRUE.
>
>  
>
>>merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
>>    
>>
>  col1 col2
>1    C    1
>2    C    1
>3    A   NA
>4    A   NA
>5    0   NA
>6    0   NA
>
>But I want this order as it is in tmp 1
>
>  col1
>1    A
>2    A
>3    C
>4    C
>5    0
>6    0
>
>
>
>
>  
>
>>>Hello!
>>>
>>>I am merging two datasets and I have encountered a problem with sort.
>>>Can someone please point me to my error. Here is the example.
>>>
>>>## I have dataframes, first one with factor and second one with factor
>>>## and integer
>>>
>>>      
>>>
>>>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
>>>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
>>>>tmp1
>>>>        
>>>>
>>>col1
>>>1    A
>>>2    A
>>>3    C
>>>4    C
>>>5    0
>>>6    0
>>>
>>>      
>>>
>>>>tmp2
>>>>        
>>>>
>>>col1 col2
>>>1    C    1
>>>2    D    2
>>>3    E    3
>>>4    F    4
>>>
>>>## Now merge them
>>>
>>>      
>>>
>>>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>>>        
>>>>
>>>                all.x = TRUE, sort = FALSE))
>>>col1 col2
>>>1    C    1
>>>2    C    1
>>>3    A   NA
>>>4    A   NA
>>>5    0   NA
>>>6    0   NA
>>>
>>>## As you can see, sort was applied, since row order is not the same as
>>>## in tmp1. Reading help page for ?merge did not reveal much about
>>>## sorting. However I did try to see the result of "non-default" -
>>>## help page says that order should be the same as in 'y'. So above
>>>## makes sense
>>>
>>>## Now merge - but change x an y
>>>
>>>      
>>>
>>>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>>>        
>>>>
>>>                all.y = TRUE, sort = FALSE))
>>>col1 col2
>>>1    C    1
>>>2    C    1
>>>3    A   NA
>>>4    A   NA
>>>5    0   NA
>>>6    0   NA
>>>
>>>## The result is the same. I am stumped here. But looking a bit at these
>>>## object I found something peculiar
>>>
>>>
>>>      
>>>
>>>>str(tmp1)
>>>>        
>>>>
>>>`data.frame':   6 obs. of  1 variable:
>>>$ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
>>>
>>>      
>>>
>>>>str(tmp2)
>>>>        
>>>>
>>>`data.frame':   4 obs. of  2 variables:
>>>$ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
>>>$ col2: int  1 2 3 4
>>>
>>>      
>>>
>>>>str(tmp12)
>>>>        
>>>>
>>>`data.frame':   6 obs. of  2 variables:
>>>$ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
>>>$ col2: int  1 1 NA NA NA NA
>>>
>>>      
>>>
>>>>str(tmp21)
>>>>        
>>>>
>>>`data.frame':   6 obs. of  2 variables:
>>>$ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
>>>$ col2: int  1 1 NA NA NA NA
>>>
>>>## Is it OK, that internal presentation of factors vary between
>>>## different merges. Levels are also different, once only levels
>>>## from original data.frame are used, while in second example all
>>>## levels are propagated.
>>>
>>>## I have tried the same with characters
>>>
>>>      
>>>
>>>>tmp1$col1 <- as.character(tmp1$col1)
>>>>tmp2$col1 <- as.character(tmp2$col1)
>>>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>>>        
>>>>
>>>                all.x = TRUE, sort = FALSE))
>>>col1 col2
>>>1    C    1
>>>2    C    1
>>>3    A   NA
>>>4    A   NA
>>>5    0   NA
>>>6    0   NA
>>>
>>>
>>>      
>>>
>>>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>>>        
>>>>
>>>                all.y = TRUE, sort = FALSE))
>>>col1 col2
>>>1    C    1
>>>2    C    1
>>>3    A   NA
>>>4    A   NA
>>>5    0   NA
>>>6    0   NA
>>>
>>>## The same with characters. Is this a bug. It definitely does not agree
>>>## with help page, since order is not the same as in 'y'. Can someone
>>>## please check on newer versions?
>>>
>>>## Is there any other way to get the same order as in 'y' i.e. tmp1?
>>>
>>>
>>>      
>>>
>>>>R.version
>>>>        
>>>>
>>>       _
>>>platform i486-pc-linux-gnu
>>>arch     i486
>>>os       linux-gnu
>>>system   i486, linux-gnu
>>>status
>>>major    2
>>>minor    2.0
>>>year     2005
>>>month    10
>>>day      06
>>>svn rev  35749
>>>language R
>>>
>>>Thank you very much!
>>>
>>>--
>>>Lep pozdrav / With regards,
>>>  Gregor Gorjanc
>>>
>>>----------------------------------------------------------------------
>>>University of Ljubljana     PhD student
>>>Biotechnical Faculty
>>>Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
>>>Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si
>>>
>>>SI-1230 Domzale             tel: +386 (0)1 72 17 861
>>>Slovenia, Europe            fax: +386 (0)1 72 17 888
>>>
>>>----------------------------------------------------------------------
>>>"One must learn by doing the thing; for though you think you know it,
>>>you have no certainty until you try." Sophocles ~ 450 B.C.
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>>
>>>      
>>>
>
>
>  
>




More information about the R-help mailing list