[R] how to rewrite this without a loop ?

Stijn Lievens stijn.lievens at ugent.be
Fri Nov 19 10:37:10 CET 2004


Thomas Lumley wrote:
> On Thu, 18 Nov 2004, Stijn Lievens wrote:
> 
>>
>> <code>
>> add.fun <- function(perf.data) {
>>   ss <- 0
>>   for (i in 0:29) {
>>       ss <- ss + cor(subset(perf.data, dataset == i)[3], 
>> subset(perf.data, dataset == i)[7], method = "kendall")
>>   }
>>   ss    }
>> </code>
>>
>> As one can see this function uses a for-loop.  Now chapter 9 of 'An 
>> introduction to R' tells us that we should avoid for-loops as much as 
>> possible.
> 
> 
> 
> You don't say whether `dataset' is the name of a column in `perf.data'. 
> Assuming it is, and assuming that 0:29 are all the values of `dataset'
> 
> sum(by(perf.data, list(perf.data$dataset),
>           function(d)  cor(d[,3],d[,7], method="kendall")))
> 
> would work.  

Indeed, this works.  The 'by' command is exactly what I was looking for.
As far as I can tell, this useful command it isn't mentioned in 'An 
introduction to R'.

> If this is faster it will be because you don't call 
> subset() twice per iteration, rather than because you are avoiding a 
> loop.  However it has other benefits: it doesn't have the variable `i', 
> it doesn't have to change the value of `ss', and it doesn't have the 
> range of `dataset' hard-coded into it.  These are all clarity 
> optimisations.
> 

In fact I don't care too much about speed at the moment, but a one-line 
statement is more convenient to type (and recall) in the command line 
interface then a multi-line statmement.

Your solution really does the trick for me.  Thanks,

Stijn.


>     -thomas




More information about the R-help mailing list