[R-sig-eco] The final result of TWINSPAN

Wed Apr 27 19:48:59 CEST 2011

Thanks Jari,

     My original thought was to write a wrapper for the original FORTRAN 
code by replacing the file read of data with data passed from R, and 
then bringing in the results in a list.  That would allow sizing the 
arrays at run-time and eliminating fixed array sizes.  I have a copy of 
the FORTRAN code that Petr Smilauer modified for simplified 
input/outputand that helped.  Still, it ultimately appeared pretty messy 
(but still might be the best route), so I tried separating out the 
subroutines and calling them individually from R.  From there I tried 
replacing some of the subroutines with native R to lower overhead.  But 
in the end I just couldn't understand the code well enough to make it work.

     So then I thought I should write write a totally transparent 
version in native R, even if it doesn't replicate the original.  On the 
down side people can say it's not correct; on the upside it's open 
source and people can evaluate it and modify it as they see fit.  So, if 
there is interest I might post the code and examples on my web page and 
let somebody else have it to run with.

Dave

Jari Oksanen wrote:
> On 27/04/11 00:40 AM, "Dave Roberts" <dvrbts at ecology.msu.montana.edu> wrote:
>>      Earlier this year on an (undoubtedly ill-advised) lark I coded up
>> an R version of TWINSPAN.  It's far from a polished package at this
>> point, but the code does run.  One of the interesting features is that
>> you can partition a PCO or NMDS in addition to the traditional CA. To be
>> clear, I am not a TWINSPAN fan either, but I wanted it for a methods
>> paper I was working on.
>>
>>      The problem is that I based the code on Hill, Bunch & Shaw (1975,
>> J of  Ecol  63:597-613) which is what I had available.  Apparently the
>> algorithm in the commercial TWINSPAN is significantly modified from the
>> original, but I couldn't find a description of the actual algorithm
>> anywhere in the literature.  It is probably described in the User Manual
>> of the software, but I was not sufficiently motivated to chase down a
>> copy.  I do have a copy of the FORTRAN code, but it was apparently
>> written in FORTRAN II, and is basically inscrutable, even to an old
>> FORTRAN dog like me.
>>
>>      So, if somebody has a clear description of the actual algorithm
>> (and I think it is disturbing that I could not find one), it would be
>> possible to code it up in native R.  The alternative, to write a wrapper
>> for the original FORTRAN code is not a trivial task.  I gave it a couple
>> of days and gave up.
> 
> Dave,
> 
> Hill, Bunch & Shaw describe the general idea of TWINSPAN, but the
> implementation is more complicated. Martin Kent and Paddy Coker do a great
> job of explaining the twists in their book ("vegetation description and
> analysis: a practical approach"). If I remember correctly, the TWINSPAN
> manual also was more detailed, but I lost it somewhere when I moved around
> (for the kids: it was a bunch of paper: pdf was not yet invented when
> TWINSPAN was published).
> 
> I don't think that the actual TWINSPAN is easily extended beyond CA. Each
> step is a two-stage one-dimensional ordination on a current subset, where
> the first stage selects indicators and the second stage is polarized for the
> indicator species. The final split is based on site ordination and
> indicators are secondary (which we see in misclassifications if you try to
> use the provided key for the data that was classified in TWINSPAN). The
> polarization stage is particularly challenging when working with
> dissimilarities (PCO, NMDS).
> 
> I don't think that the FORTRAN I have is completely impenetrable. I think
> the largest problem is the design principle: R code should run silently and
> return a result, but TWINSPAN prints when it goes on and returns only a part
> of the result. Incorporating that in R would need stripping most PRINT and
> WRITE and have subroutines to return useful data directly.
> 
> I also wrote a small funny test on TWINSPAN principle, where the splitting
> and pre-defined pseudospecies where replaced with regression tree split.
> I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a
> separate message.
> 
> Cheers, Jari Oksanen
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology