[R] Could not fit correct values in discriminant analysis by bruto.

Mon Apr 9 20:23:38 CEST 2007

Shuji,

My answer is "it depends". Using regression on zero/one dummy variables
can work, but there are more direct approaches to classification that
might work better.

In the case that you showed us, the data were linearly separable by one
predictor. If the data that you posted are representative of your
typical data sets, you might want to use a very simple approach:
identify the important predictor and use an ROC curve to find a good
cutoff for it. You don't need the other data.

However, if these data represent an "easy" case, you might want to try
other approaches to classification. I usually start with either
randomForest or a bagged tree (see the randomForest and ipred packages).
If I can get good performance with those models, I'll try simpler models
that are more interpretable, like single trees (the rpart package), FDA
with method = mars (using the mda package) and other approaches. 

Max

-----Original Message-----
From: ?? ?? [mailto:kawaguchi at math.kyushu-u.ac.jp] 
Sent: Monday, April 09, 2007 1:49 PM
To: Kuhn, Max
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Could not fit correct values in discriminant analysis
by bruto.

Dear Max,

Thank you very much ! Your sample code is very helpful.

In linear separable problem, I should use fda by linear regression
instead of bruto unless taking some dimensional reduction process,  
should I?

Cheers.

Shuji

On 2007/04/09, at 22:25, Kuhn, Max wrote:

> Shuji,
>
> I suspect that bruto blows up because your data are linearly  
> separable.
> To see this (if you didn't already know), try
>
>    library(lattice)
>    splom(~x, groups = y)
>
> and look at the first row. If you are trying to do classification,  
> there
> are a few methods that would choke on this (logistic regression) and a
> few that won't (trees, svms etc). I would guess that bruto is in the
> latter group.
>
> However, if you are try to do classification, try using bruto via fda:
>
>> tmp <- cbind(x, factor(y))
>>
>> fdaFit <- fda(y2~., tmp)
>> fdaFit
>    Call:
>    fda(formula = y2 ~ ., data = tmp)
>
>    Dimension: 1
>
>    Percent Between-Group Variance Explained:
>     v1
>    100
>
>    Degrees of Freedom (per dimension): 5
>
>    Training Misclassification Error: 0 ( N = 20 )
>>
>> predict(fdaFit, type = "posterior")[1:3,]
>      0 1
>    2 0 1
>    2 0 1
>    2 0 1
>
> Max
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ??
> Sent: Sunday, April 08, 2007 10:47 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Could not fit correct values in discriminant analysis by
> bruto.
>
> Dear R-users,
>
> I would like to use "bruto" function in mda package for flexible
> discriminant analysis.
> Then, I tried, for example, following approach.
>
>> x
>              band1              band2            band3
> 1   -1.206780      -1.448007      -1.084431
> 2   -0.294938      -0.113222      -0.888895
> 3   -0.267303      -0.241567      -1.040979
> 4   -1.206780      -1.448007      -1.040979
> 5   -1.151518      -0.806286      -0.671630
> 6   -1.179146      -1.396670      -1.453775
> 7   -0.294938      -0.241567      -1.453775
> 8   -0.350200      -0.267239      -1.084431
> 9   -1.151518      -0.857623      -0.649901
> 10  1.362954      -1.396670      -2.235926
> 11 -0.239675       1.118883       1.457551
> 12 -0.294938      -1.268325      -0.497817
> 13 -0.294938      -0.729278      -0.106745
> 14 -1.123883      -0.703612      -0.150196
> 15  0.616905       1.144548       -0.150196
> 16 -0.267303      1.657930         1.044750
> 17  1.611637      1.041874          0.610225
> 18 -1.123883     -0.677941         0.262605
> 19 -0.239675     -0.626604        -0.128473
> 20  2.274797       1.118883         1.805171
>
>> y
> [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
>
>> fit <- bruto(x,y)
>
> But, obtained fit$fitted.values are enormously high (or low) .
> Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or  
> 0).
> Values of column 1 are wrong or appropriate option is needed?
> I contacted the package maintainer, but the problem could not be  
> solved.
>
> Thanks
>
> Shuji Kawaguchi
>
>> R.version
> platform       i386-apple-darwin8.8.1
> arch           i386
> os             darwin8.8.1
> system         i386, darwin8.8.1
> version.string R version 2.4.0 (2006-10-03)

----------------------------------------------------------------------
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}