[R-SIG-Finance] Accessing the output of the MINE routine
Matt Considine
matt at considine.net
Mon Dec 26 23:34:54 CET 2011
I wanted to work in R with the output of the MINE routine found here
http://www.exploredata.net
and after awhile came up with the code below. It takes the output file
generated and allows one to work with the various columns as matrices.
Better R programmers than me will wince at the code, so hopefully if
someone has a more elegant way of accomplishing the same task they will
share it.
#needed for MINE routine
require(rJava)
#load market data
require(PortfolioAnalytics)
data(indexes)
#write CSV file of data to current working directory
datafilename <- "indexes.csv"
write.table(indexes, datafilename, sep=",", col.names=TRUE,
row.names=FALSE, quote=FALSE, na="NA")
#read MINE R code
source.with.encoding('MINE.r', encoding='UTF-8')
#run MINE routine on data
MINE(datafilename,"all.pairs")
#read output of MINE routine
#file name could be figured out algorithmically
#data is sorted in descending order of MIC variable
#output is half of a square symmetric matrix, excluding diagonal
#there are 9 columns, 7 of which are various stats
outputfilename <- paste(datafilename,",B=n^0.6,k=15,Results.csv",sep="")
#or outputfilename <- sprintf("%s,B=n^0.6,k=15,Results.csv",datafilename)
x<-read.csv(outputfilename,header=TRUE)
#isolate MIC data and names of variable pairs for this example
#figure out what variables go where based on frequency and knowing
# it is a half of a symmetric matrix
m1<-apply(x,2,table)
#isolate row/col frequencies as a matrix. we need to look at
# both to get the complete list of pairs and their respective frequencies
m2x<-as.matrix(m1$X.var)
m2y<-as.matrix(m1$Y.var)
#get frequencies
testx<-as.matrix(m2x[x$X.var])
testy<-as.matrix(m2y[x$Y.var])
#add the frequencies to the original data
x2<-cbind(x,testx,testy)
#sort rows based on frequency of second then first variable
x2<-x2[order(x2$testy,decreasing=FALSE),]
fx <- x2[order(x2$testx,decreasing=TRUE),]
#fx is now sorted in decreasing frequency of col testvec and
# within those groups, ascending frequency of testvec2
#create the correct sized matrix
z<-diag(length(m2x)+1)
#Now extract the data we want, in this case column 3 (MIC)
z[row(z)>col(z)]<-fx[,3]
z<-z+t(z)
diag(z)<-1
#create col/row names
fxnames<-c(names(m2x[order(m2x,decreasing=TRUE),]),
labels(m2y[order(m2y,decreasing=TRUE),][1]))
colnames(z)<-fxnames
rownames(z)<-fxnames
z
Hope this helps someone,
Matt
More information about the R-SIG-Finance
mailing list