[R] Merge sort
Duncan Murdoch
murdoch.duncan at gmail.com
Wed Apr 20 14:02:53 CEST 2016
On 20/04/2016 7:38 AM, Gaston wrote:
> I indeed used is.na() to check length, as I was not sure weather
> lenght() was a simple query or would go through the whole vector to
> count the elements.
length() is a simple query, and is very fast. The other problem in your
approach (which may not be a problem with your current data) is that NA
is commonly used as an element of a vector to represent a missing value.
>
> So to sum up, function calls are expensive, therefore recursion should
> be avoided, and growing the size of a vector (which is probably
> reassigning and copying?) is also expensive.
"Avoided" may be too strong: speed isn't always a concern, sometimes
clarity is more important. Growing vectors is definitely expensive.
Duncan Murdoch
>
> Thank you for your help!
>
>
>
> On 04/19/2016 11:51 PM, Duncan Murdoch wrote:
>> On 19/04/2016 3:39 PM, Gaston wrote:
>>> Hello everyone,
>>>
>>> I am learning R since recently, and as a small exercise I wanted to
>>> write a recursive mergesort. I was extremely surprised to discover that
>>> my sorting, although operational, is deeply inefficient in time. Here is
>>> my code :
>>>
>>>> merge <- function(x,y){
>>>> if (is.na(x[1])) return(y)
>>>> else if (is.na(y[1])) return(x)
>>>> else if (x[1]<y[1]) return(c(x[1],merge(x[-1],y)))
>>>> else return(c(y[1],merge(x,y[-1])))
>>>> }
>>>>
>>>> division <- function(x){
>>>> if (is.na(x[3])) return(cbind(x[1],x[2]))
>>>> else
>>>> return(cbind(c(x[1],division(x[-c(1,2)])[,1]),c(x[2],division(x[-c(1,2)])[,2])))
>>>>
>>>> }
>>>>
>>>> mergesort <- function(x){
>>>> if (is.na(x[2])) return(x)
>>>> else{
>>>> print(x)
>>>> t=division(x)
>>>> return(merge(mergesort(t[,1]),mergesort(t[,2])))
>>>> }
>>>> }
>>>
>>> I tried my best to write it "the R-way", but apparently I failed. I
>>> suppose some of the functions I used are quite heavy. I would be
>>> grateful if you could give a hint on how to change that!
>>>
>>> I hope I made myself clear and wish you a nice day,
>>
>> Your use of is.na() looks strange. I don't understand why you are
>> testing element 2 in mergesort(), and element 1 in merge(), and
>> element 3 in division. Are you using it to test the length? It's
>> better to use the length() function for that.
>>
>> The division() function returns a matrix. It would make more R-sense
>> to return a list containing the two parts, because they might not be
>> the same length.
>>
>> Generally speaking, function calls are expensive in R, so the
>> recursive merge you're using looks like it would be the bottleneck.
>> You'd almost certainly be better off to allocate something of
>> length(x) + length(y), and do the assignments in a loop.
>>
>> Here's a merge sort I wrote as an illustration in a class. It's
>> designed for clarity rather than speed, but I'd guess it would be
>> faster than yours:
>>
>> mergesort <- function(x) {
>>
>> n <- length(x)
>> if (n < 2) return(x)
>>
>> # split x into two pieces of approximately equal size, x1 and x2
>>
>> x1 <- x[1:(n %/% 2)]
>> x2 <- x[(n %/% 2 + 1):n]
>>
>> # sort each of the pieces
>> x1 <- mergesort(x1)
>> x2 <- mergesort(x2)
>>
>> # merge them back together
>> result <- c()
>> i <- 0
>> while (length(x1) > 0 && length(x2) > 0) {
>> # compare the first values
>> if (x1[1] < x2[1]) {
>> result[i + 1] <- x1[1]
>> x1 <- x1[-1]
>> } else {
>> result[i + 1] <- x2[1]
>> x2 <- x2[-1]
>> }
>> i <- i + 1
>> }
>>
>> # put the smaller one into the result
>> # delete it from whichever vector it came from
>> # repeat until one of x1 or x2 is empty
>> # copy both vectors (one is empty!) onto the end of the results
>> result <- c(result, x1, x2)
>> result
>> }
>>
>> If I were going for speed, I wouldn't modify the x1 and x2 vectors,
>> and I'd pre-allocate result to the appropriate length, rather than
>> growing it in the while loop. But that was a different class!
>>
>> Duncan Murdoch
>
More information about the R-help
mailing list