[R] Variable Selection for data reduction and discriminant anlaysis

Mark Difford mark_difford at yahoo.co.uk
Mon Sep 22 08:48:47 CEST 2008


Hi Gareth,

>> My data is transformed to the clr or alr under Aitchison geometry, so I
>> am essentially working 
>> in Euclidean space.

Great: glad to hear it.

>> Has anyone had experience doing stepwise LDA??  I can't for the life of
>> me find any help 
>> online about where to start.

A better option might be this: Trevor Hastie and a student of his have
recently put out a paper that does a step-up from penalized discriminant
analysis based, I think, on Trevor's sparse principal component analysis
method (in his elasticnet package).

http://www-stat.stanford.edu/~hastie/Papers/sda_line.pdf

You can get R-code to do the analysis on the first author's website; there's
a link in the paper.

Bye, Mark.


gcam032 wrote:
> 
> Thanks Mark,
> 
> I failed to mention that i'm working within a compositional framework.  I
> didn't want to confuse things.  My data is transformed to the clr or alr
> under Aitchison geometry, so I am essentially working in Euclidean space. 
> 
> Has anyone had experience doing stepwise LDA??  I can't for the life of me
> find any help online about where to start.
> 
> Thanks
> 
> Gareth
> 
> 
> quote author="Mark Difford">
> Hi Gareth,
> 
>>> If I use the full composition (31 elements or variables), I can get
>>> reasonable separation of my 6 sources.
> 
> A word of advice: You need to be exceptionally careful when analyzing
> compositional data. Taking compositions puts your data values into a
> constrained/bounded space (generally called a simplex) so that most
> standard statistical procedures (i.e. anything that uses a Euclidean
> metric, and most do) deliver erroneous results. Pearson wrote a paper on
> this long ago, but it's generally been ignored (except by Aitchison and
> the Spanish School of mathematical statisticians).
> 
> The problem is comparatively well known to geologists, who work with
> compositional much of the time. R has a very good package for analysing
> this data-type: see the compositions package  (a new release seems
> iminent). You will be able to get most of the main references from it.
> (The authors of the package also have a newly-released article in one of
> the Elsevier journals [unfor. my bib+ are elsewhere so I cannot give
> details]).
> 
> You could start by Wiki'ing your way to "compositional data".
> 
> HTH, Mark.
> 
> 
> 
> Gareth Campbell wrote:
>> 
>> Hello all,
>> 
>> I'm dealing with geochemical analyses of some rocks.
>> 
>> If I use the full composition (31 elements or variables), I can get
>> reasonable separation of my 6 sources.  Then when I go onto do LDA with
>> the
>> 6 groups, I get excellent separation.
>> 
>> I feel like I should be reducing the variables to thos that are providing
>> the most discrimination between the groups as this is important
>> information
>> for me.  I struggle to interpret the PCA plot in a way that helps me (due
>> to
>> the large number of elements).  So I'm trying to do some sort of
>> step-wise
>> variable selection.
>> 
>> I would love to hear from someone (possibly a geochemist or similar) who
>> does this regularly to determine the best course of action in R to do
>> this.
>> 
>> 
>> Thanks very much
>> 
>> 
>> -- 
>> Gareth Campbell
>> PhD Candidate
>> The University of Auckland
>> 
>> P +649 815 3670
>> M +6421 256 3511
>> E gareth.campbell at esr.cri.nz
>> gcam032 at gmail.com
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> 



-- 
View this message in context: http://www.nabble.com/Variable-Selection-for-data-reduction-and-discriminant-anlaysis-tp19591270p19602702.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list