[R] dist like function but where you can configure the method

Jari Oksanen jari.oksanen at oulu.fi
Fri May 16 18:54:30 CEST 2014


I did not regard the loops as the overhead but a part of the process. Overhead is setting attributes. The loop is not so very expensive compared to ks.test(). You can always replace the loop with an apply on the vector of indices, but about the only way to speed up calculations is to use parallel processing (parLapply, parSapply, parRapply functions of the parallel processing.

I wrote about vectorization: that would be faster, but it cannot be done blindly to just "any function", but you must deconstruct the function to see if it can decomposed into operations of vectors. In vegan:::designdist we do that for some function types, but you really must *think* about the function you are using to know if you can write it in vectorized form. It is not automatic.

Cheers, Jari Oksanen
On 16/05/2014, at 18:46 PM, Witold E Wolski wrote:

> Dear Jari,
> 
> Thanks for your reply...
> 
> The overhead would be
> 2 for loops
> for(i in 1:dim(x)[2])
> for(j in i:dim(x)[2])
> 
> isn't it? Or are you seeing a different way to implement it?
> 
> A for loop is pretty expensive in R. Therefore I am looking for an
> implementation similar to apply or lapply were the iteration is made
> in native code.
> 
> 
> 
> 
> 
> On 16 May 2014 15:57, Jari Oksanen <jari.oksanen at oulu.fi> wrote:
>> Witold E Wolski <wewolski <at> gmail.com> writes:
>> 
>>> 
>>> Looking for an  fast dist implementation
>>> where I could pass my own dist function to the "method" parameter
>>> 
>>> i.e.
>>> 
>>> mydistfun = function(x,y){
>>> return(ks.test(x,y)$p.value)   #some mystique implementation
>>> }
>>> 
>>> wow = dist(data,method=mydistfun)
>> 
>> I think it is best to write that function yourself.
>> 
>> The "dist" object is a vector corresponding to a lower triangle
>> (without the diagonal) of a symmetric matrix and with attributes.
>> The attributes are class which should be c("mydist", "dist"), Size
>> which is the length(x), Labels (optional) which are the
>> names of your items and if given, should have length(x),
>> call = match.call(), Diag = FALSE, Upper = FALSE and method name.
>> All you need is a vector with attributes.
>> 
>> All this will add very little overhead to your calculation, so
>> for all practical purposes this implementation is just as fast as
>> is your "mystique implementation" of pairwise distances. Your
>> example (ks.test()) probably would be pretty slow. If you can
>> vectorize your distance, it can be really fast, even if you
>> calculate the full symmetric matrix and throw away the diagonal and
>> upper triangle.
>> 
>> Cheers, Jari Oksanen
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Witold Eryk Wolski



More information about the R-help mailing list