[R-sig-Geo] Classification of attribute table

Tue May 12 11:10:51 CEST 2009

Hi Dan, Dylan, Thierry & the rest of the list

Firstly, thanks for your input so far. Unfortunately I am running out of time as I need to get the analysis complete before IGARSS 09 in Cape Town so I don't think I will be able to implement your suggestions. In the meantime I have just selected segments which are above the 50th percentile and used those for further analysis (segments containing brighter values are considered tree crowns). I am not sure if classification could improve my tree counting accuracy, my intial results return +-70% accuracy when compared to field enumeration which I can improve upon by tweaking the watershed segmentation.

To answer your question Dan, the 1916 cases are fixed with no additional cases, although the same analysis will be applied in at least 10 other discrete plantation forest compartments. Perhaps the probability of cluster membership could be used in the rest of the compartments based on the initial clustering, not sure if that will work but could be interesting to test for the paper associated with this work.

Once I have finished the poster for IGARSS I will revisit the classification as this work is the final chapter of my PhD and would like to get it published.

Many thanks to the list for all your assistance.
Kind regards,
Wesley

Wesley Roberts MSc.
Researcher: Earth Observation (Ecosystems)
Natural Resources and the Environment
CSIR
Tel: +27 (21) 888-2490
Fax: +27 (21) 888-2693

"To know the road ahead, ask those coming back."
- Chinese proverb

>>> Dan Putler <dan.putler at sauder.ubc.ca> 05/11/09 5:54 PM >>> 
Hi Wesley,

So you just want to partition the 1916 cases into three clusters. This
is a clustering problem rather than a discriminant analysis oriented
classification problem. As a result, Dylan Beaudette's suggestion of
using the clara() function is pretty reasonable, but your data set isn't
so large that other (more computationally intensive) algorithms can't be
used (assuming you have a machine with a reasonable amount of memory in
it). Moreover, some of your measures are very highly correlated with one
another (var and stdev for instance), so you can probably reduce the
number of variables used in the clustering.

Is the 1916 cases fixed, or will you want to take new cases and then
assign them to one of the three clusters created using the original
1916? If this is the case, using model based clustering might make the
most sense since you have a clean way of assigning new cases to the
existing clusters based on the posterior probability of cluster
membership.

Dan

On Mon, 2009-05-11 at 07:44 -0700, Dylan Beaudette wrote:
> See the clara() function from the cluster package. It scales fairly
> well to larger-sizes data sets.
> 
> Cheers,
> Dylan
> 
> On Mon, May 11, 2009 at 5:35 AM, Wesley Roberts <wroberts at csir.co.za> wrote:
> > Hi Dan,
> >
> > Thanks for the advice. I want to classify my data into three classes; canopy, non-canopy and ground based on six input variables. The input variables are mean, min, max, median, var, stdev, and kurtosis of spatially co-incident spectra associated with each segment. I have 1916 cases and the data are formatted like an ESRI attribute table, each row corresponds to one particular segment,
> >      mean  min  max  median  var  stdev  kurtosis
> > 1
> > 2        values extracted from the imagery
> > 3
> > .
> > .1916
> >
> > I would thus like to classify the segments into three classes and essentially add an additional column to the attribute table with values 1, 2, and 3 denoting the class of the particular segment. Ideally the classification must be un-supervised as the whole procedure should be as automatic as possible with limited input from the user. Initially I wanted to use lda (MASS) but it required training classes.
> >
> > An alternative option is to use the hypothesis that segments with brighter spectra are more likely to come from tree crowns and thus just subset / select the segments which fall into for example the 90th percentile and label those as tree crowns.
> >
> > Many thanks,
> > Wesley
> >
> >
> >
> > Wesley Roberts MSc.
> > Researcher: Earth Observation (Ecosystems)
> > Natural Resources and the Environment
> > CSIR
> > Tel: +27 (21) 888-2490
> > Fax: +27 (21) 888-2693
> >
> > "To know the road ahead, ask those coming back."
> > - Chinese proverb
> >
> >
> >
> >>>> Dan Putler <dan.putler at sauder.ubc.ca> 05/07/09 6:13 PM >>>
> > Hi Wesley,
> >
> > Is this classification problem or a clustering problem? Specifically, is
> > the ultimate goal to predict what segment a new polygon belongs in, or
> > are you trying to form 3 segments to begin with based on the six
> > measures you have available? If it is the latter, it is a cluster
> > analysis problem rather than a classification problem, and you'll want
> > to look at the Cluster Analysis and Finite Mixture Models task view at
> > http://cran.r-project.org/web/views/Cluster.html.
> >
> > Dan
> >
> > On Thu, 2009-05-07 at 14:58 +0200, Wesley Roberts wrote:
> >> Dear R-sig-geo users,
> >>
> >> I have the output of a watershed segmentation in vector format (shapefile) which has it's attribute table populated with statistics regarding spectral reflectance of each polygon object. The attribute data was sourced from a geographically co-incident aerial photograph. I would now like to classify the segments using the attribute data. This seems like an easy task but I am struggling to find a suitable method. I have looked at 'lda' and 'qda' in the MASS package but the selection of an appropriate model using 'cv1EMtrain' takes a really long time. In essence all I want to do is classify the 6 variable data set into 3 classes with the class for each case recorded in the attribute table.
> >>
> >> Any advice or suggestions would be greatly appreciated.
> >>
> >> Many thanks and kind regards,
> >> Wesley
> >>
> >>
> >>
> >> Wesley Roberts MSc.
> >> Researcher: Earth Observation (Ecosystems)
> >> Natural Resources and the Environment
> >> CSIR
> >> Tel: +27 (21) 888-2490
> >> Fax: +27 (21) 888-2693
> >>
> >> "To know the road ahead, ask those coming back."
> >> - Chinese proverb
> >>
> >>
> >>
> >>
> > --
> > Dan Putler
> > Sauder School of Business
> > University of British Columbia
> >
> >
> >
> > --
> > This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
> > The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
> >
> > This message has been scanned for viruses and dangerous content by MailScanner,
> > and is believed to be clean.  MailScanner thanks Transtec Computers for their support.
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
-- 
Dan Putler
Sauder School of Business
University of British Columbia

-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.