[R] Variable Selection for data reduction and discriminant anlaysis
Mark Difford
mark_difford at yahoo.co.uk
Mon Sep 22 08:48:47 CEST 2008
Hi Gareth,
>> My data is transformed to the clr or alr under Aitchison geometry, so I
>> am essentially working
>> in Euclidean space.
Great: glad to hear it.
>> Has anyone had experience doing stepwise LDA?? I can't for the life of
>> me find any help
>> online about where to start.
A better option might be this: Trevor Hastie and a student of his have
recently put out a paper that does a step-up from penalized discriminant
analysis based, I think, on Trevor's sparse principal component analysis
method (in his elasticnet package).
http://www-stat.stanford.edu/~hastie/Papers/sda_line.pdf
You can get R-code to do the analysis on the first author's website; there's
a link in the paper.
Bye, Mark.
gcam032 wrote:
>
> Thanks Mark,
>
> I failed to mention that i'm working within a compositional framework. I
> didn't want to confuse things. My data is transformed to the clr or alr
> under Aitchison geometry, so I am essentially working in Euclidean space.
>
> Has anyone had experience doing stepwise LDA?? I can't for the life of me
> find any help online about where to start.
>
> Thanks
>
> Gareth
>
>
> quote author="Mark Difford">
> Hi Gareth,
>
>>> If I use the full composition (31 elements or variables), I can get
>>> reasonable separation of my 6 sources.
>
> A word of advice: You need to be exceptionally careful when analyzing
> compositional data. Taking compositions puts your data values into a
> constrained/bounded space (generally called a simplex) so that most
> standard statistical procedures (i.e. anything that uses a Euclidean
> metric, and most do) deliver erroneous results. Pearson wrote a paper on
> this long ago, but it's generally been ignored (except by Aitchison and
> the Spanish School of mathematical statisticians).
>
> The problem is comparatively well known to geologists, who work with
> compositional much of the time. R has a very good package for analysing
> this data-type: see the compositions package (a new release seems
> iminent). You will be able to get most of the main references from it.
> (The authors of the package also have a newly-released article in one of
> the Elsevier journals [unfor. my bib+ are elsewhere so I cannot give
> details]).
>
> You could start by Wiki'ing your way to "compositional data".
>
> HTH, Mark.
>
>
>
> Gareth Campbell wrote:
>>
>> Hello all,
>>
>> I'm dealing with geochemical analyses of some rocks.
>>
>> If I use the full composition (31 elements or variables), I can get
>> reasonable separation of my 6 sources. Then when I go onto do LDA with
>> the
>> 6 groups, I get excellent separation.
>>
>> I feel like I should be reducing the variables to thos that are providing
>> the most discrimination between the groups as this is important
>> information
>> for me. I struggle to interpret the PCA plot in a way that helps me (due
>> to
>> the large number of elements). So I'm trying to do some sort of
>> step-wise
>> variable selection.
>>
>> I would love to hear from someone (possibly a geochemist or similar) who
>> does this regularly to determine the best course of action in R to do
>> this.
>>
>>
>> Thanks very much
>>
>>
>> --
>> Gareth Campbell
>> PhD Candidate
>> The University of Auckland
>>
>> P +649 815 3670
>> M +6421 256 3511
>> E gareth.campbell at esr.cri.nz
>> gcam032 at gmail.com
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
--
View this message in context: http://www.nabble.com/Variable-Selection-for-data-reduction-and-discriminant-anlaysis-tp19591270p19602702.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list