[R-sig-finance] bayesian signal classifier

Krishna Kumar kriskumar at earthlink.net
Tue Nov 29 06:07:52 CET 2005

Here is a simple example of how this could be done. This is using the 
mclust package.
(for more on mclust see here 

I have attached a test file there are 11 columns the first 10 columns 
are various features(signals)
and the last column -11  is  the best thing to do based on 20/20 
 i.e. buy or sell indicator

 > require("mclust")
 > findata<-read.table("findata.txt",header=F)

# Create a matrix out of the data

> finMatrix <- as.matrix(findata[,1:10])

# identify column 11 as the classes and do the classifcation with default values.

> finClass <- findata[,11]
> finMclust <- Mclust(finMatrix,maxG=2)
> plot(finMclust,finMatrix)

Where we trained on the entire data set and finMclust$classification gives
the decision made by the classifier.

Now if you want to train on a subset say all the odd rows 
[or one can alternatively cross validate with ?sample or even bootstrap a training data set.]

>odd <- seq(from=1, to=nrow(findata), by=2)
>even <- seq(from=2, to=nrow(findata), by=2)
>round(cv1EMtrain(data = findata[odd,-11], labels = findata[odd,11]),3)

This will show that the VVI model would be selected based on the training data(all the odd rows)

> vviModd <- mstepVVI(data=findata[odd,-11], z=unmap(findata[odd,11]))
> vviZ <- do.call("estepVVI", c(vviModd, list(data=findata[,-11])))$z
> classError(map(vviZ[odd,]), findata[odd,11])

How do we do on the test data?

 >classError(map(vviZ[even,]), findata[even,11])

===  0.04081633

Hmmm. so a classification error of 4% on the test data....whew!  (maybe 
i am missing something..)


paul sorenson wrote:

>I would be interested in the paper thanks.  Unfortunately my level of 
>expertise is not high in these matters.
>I may have just misunderstood yours and Krishna's response, the kind of 
>paradigm I am thinking is:
>	- User selects signals he/she wants to monitor.
>	- When the user makes a buy/sell decision, the classifier then looks at 
>the parameters of those signals and classifies the conditions for that 
>	- The user continues to train the classifier in this way, analogously 
>to training a spam filter.
>	- The classifier then can start emitting buy/sell signals based on the 
>training.  Ie it is personalized to that users previous choices.
>I only mentioned Bayesian methods because the most effective spam 
>filtering I have used is apparently based on that method 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: findata.txt
Url: https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20051129/7d5944ba/findata.txt

More information about the R-sig-finance mailing list