[R] dist like function but where you can configure the method

Witold E Wolski wewolski at gmail.com
Fri May 16 17:46:21 CEST 2014


Dear Jari,

Thanks for your reply...

The overhead would be
2 for loops
for(i in 1:dim(x)[2])
for(j in i:dim(x)[2])

isn't it? Or are you seeing a different way to implement it?

A for loop is pretty expensive in R. Therefore I am looking for an
implementation similar to apply or lapply were the iteration is made
in native code.





On 16 May 2014 15:57, Jari Oksanen <jari.oksanen at oulu.fi> wrote:
> Witold E Wolski <wewolski <at> gmail.com> writes:
>
>>
>> Looking for an  fast dist implementation
>> where I could pass my own dist function to the "method" parameter
>>
>> i.e.
>>
>> mydistfun = function(x,y){
>>  return(ks.test(x,y)$p.value)   #some mystique implementation
>> }
>>
>> wow = dist(data,method=mydistfun)
>
> I think it is best to write that function yourself.
>
> The "dist" object is a vector corresponding to a lower triangle
> (without the diagonal) of a symmetric matrix and with attributes.
> The attributes are class which should be c("mydist", "dist"), Size
> which is the length(x), Labels (optional) which are the
> names of your items and if given, should have length(x),
> call = match.call(), Diag = FALSE, Upper = FALSE and method name.
> All you need is a vector with attributes.
>
> All this will add very little overhead to your calculation, so
> for all practical purposes this implementation is just as fast as
> is your "mystique implementation" of pairwise distances. Your
> example (ks.test()) probably would be pretty slow. If you can
> vectorize your distance, it can be really fast, even if you
> calculate the full symmetric matrix and throw away the diagonal and
> upper triangle.
>
> Cheers, Jari Oksanen
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Witold Eryk Wolski



More information about the R-help mailing list