[R] slightly OT: (un)supervised clustering?
viktoras didziulis
viktoras at ekoinf.net
Tue Oct 28 20:32:58 CET 2008
Hi,
my question is not exactly about R... What I am looking for are hints
and directions on suitable methods (available in R or elsewhere) to
solve a grouping (or pattern recognition) problem of environmental
features in an environmental gradient as described below.
Given environmental sampling data set (Depth, Presence of sand,
Presence of boulders, Presence of clay).
1 1 1 0
1 1 0 0
1 1 1 0
2 1 1 0
3 1 1 0
3 1 1 0
4 1 1 0
5 1 0 0
5 1 0 0
5 1 1 0
5 1 0 0
6 1 0 0
6 1 0 0
6 1 1 0
7 1 0 1
7 1 0 0
8 1 0 1
9 1 1 1
9 1 0 1
9 1 0 1
Once I have sampling data ordered by depth, using my own "expert"
opinion I can distinguish 3 groups A, B, C: A (1 - 4 m depth range) -
where both sand and boulders are present, B (5 - 6 m range) - where sand
is dominant with just a few observations of boulders, C (7 - 9 m range)
- substrate dominated by sand and clay.
Now the question - is there any formal method that can do the same e.g.
separate the groups A, B and C by analyzing how does feature occurrence
patterns change in samples along an environmental gradient (depth in
this case)? Sample dataset here is simplified, in fact I have to deal
with a dozen of features like salinity, exposure and related species
lists. I "see" these groups as an expert, but it would be nice having a
helper algorithm to see the groups for me, so I could describe it in
Methods section of my writings :-)
Similarity matrix and Cluster analysis or MDS do not perform as
expected, because it groups stations from group A together with stations
of other groups that have most similar substrate observations e.g. it
ignores environmental gradient.
Discriminant analysis expects me to do the grouping and then it will
"decide" the rest. Therefore not suitable.
A bunch of significance tests can help in deciding whether the
differences are statistically significant. But again, I have to present
my own groups, therefore - not suitable.
Other unsupervised learning algorithms (Neural Networks & Co) - well,
how can I instruct them to do analysis along an environmental gradient
of depth ?..
If anyone among the experts on this list has dealt with similar problems
before I would highly appreciate if you could briefly describe your
approaches or point to the right sources.
And in general I am interested in approaches of locating discontinuities
in data patterns sampled along environmental gradients.
Best wishes!
Viktoras Didziulis
P.S. just subscribed to this list, sorry if I'm missing something
More information about the R-help
mailing list