[R] Variable Selection for data reduction and discriminant anlaysis
mark_difford at yahoo.co.uk
Mon Sep 22 08:48:47 CEST 2008
>> My data is transformed to the clr or alr under Aitchison geometry, so I
>> am essentially working
>> in Euclidean space.
Great: glad to hear it.
>> Has anyone had experience doing stepwise LDA?? I can't for the life of
>> me find any help
>> online about where to start.
A better option might be this: Trevor Hastie and a student of his have
recently put out a paper that does a step-up from penalized discriminant
analysis based, I think, on Trevor's sparse principal component analysis
method (in his elasticnet package).
You can get R-code to do the analysis on the first author's website; there's
a link in the paper.
> Thanks Mark,
> I failed to mention that i'm working within a compositional framework. I
> didn't want to confuse things. My data is transformed to the clr or alr
> under Aitchison geometry, so I am essentially working in Euclidean space.
> Has anyone had experience doing stepwise LDA?? I can't for the life of me
> find any help online about where to start.
> quote author="Mark Difford">
> Hi Gareth,
>>> If I use the full composition (31 elements or variables), I can get
>>> reasonable separation of my 6 sources.
> A word of advice: You need to be exceptionally careful when analyzing
> compositional data. Taking compositions puts your data values into a
> constrained/bounded space (generally called a simplex) so that most
> standard statistical procedures (i.e. anything that uses a Euclidean
> metric, and most do) deliver erroneous results. Pearson wrote a paper on
> this long ago, but it's generally been ignored (except by Aitchison and
> the Spanish School of mathematical statisticians).
> The problem is comparatively well known to geologists, who work with
> compositional much of the time. R has a very good package for analysing
> this data-type: see the compositions package (a new release seems
> iminent). You will be able to get most of the main references from it.
> (The authors of the package also have a newly-released article in one of
> the Elsevier journals [unfor. my bib+ are elsewhere so I cannot give
> You could start by Wiki'ing your way to "compositional data".
> HTH, Mark.
> Gareth Campbell wrote:
>> Hello all,
>> I'm dealing with geochemical analyses of some rocks.
>> If I use the full composition (31 elements or variables), I can get
>> reasonable separation of my 6 sources. Then when I go onto do LDA with
>> 6 groups, I get excellent separation.
>> I feel like I should be reducing the variables to thos that are providing
>> the most discrimination between the groups as this is important
>> for me. I struggle to interpret the PCA plot in a way that helps me (due
>> the large number of elements). So I'm trying to do some sort of
>> variable selection.
>> I would love to hear from someone (possibly a geochemist or similar) who
>> does this regularly to determine the best course of action in R to do
>> Thanks very much
>> Gareth Campbell
>> PhD Candidate
>> The University of Auckland
>> P +649 815 3670
>> M +6421 256 3511
>> E gareth.campbell at esr.cri.nz
>> gcam032 at gmail.com
>> [[alternative HTML version deleted]]
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
View this message in context: http://www.nabble.com/Variable-Selection-for-data-reduction-and-discriminant-anlaysis-tp19591270p19602702.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help