[R] Optimizing the loop for large data

Tue Feb 27 02:56:18 CET 2007

Rusers:

I am trying to apply a quadratic discriminant function to find the best 
classification outcomes.
1 is assigned to the values greater than a threshold value; and 0 otherwise.
I would like to see how the apparent error rates and the optimal error 
rate change with increasing threshold values.

I have a 1000*10 data matrix: n=1000 and p=10.

Here is what I wrote so far, but seems to be inefficient. I appreciate 
if someone help me out.

library(foreign)
library(MASS)

D<-read.dbf('data/Indianapolis015.dbf') # import a data

# data looks like this

          LONGLAB         X        Y Perimeter       Area    X_UTM   
Y_UTM   F0 F1  F2
1  TAZ 18011:1000 -86.25985 39.95286  2.061630  0.1862549 50600.38 
4435792  235  0  35
2  TAZ 18011:1001 -86.31030 39.97591  3.657006  0.7305006 46440.80 
4438608    0  0   0
3  TAZ 18011:1002 -86.29542 39.97054  3.516089  0.6408084 47677.31 
4437936  155  0  15
4  TAZ 18011:1003 -86.27574 39.97294  5.000185  1.2592142 49374.91 
4438102  835  0  55
5  TAZ 18011:1004 -86.25967 39.97197  4.788531  1.1930984 50741.38 
4437913  425  0  80
6  TAZ 18011:1005 -86.29245 39.98580  6.189141  1.6734483 48031.44 
4439616  185  0  35
7  TAZ 18011:1006 -86.24899 39.98259  7.525633  2.0564466 51723.80 
4439040  505  0  45
8  TAZ 18011:1007 -86.30974 39.99014  3.773037  0.7790234 46583.20 
4440186   30  0  10
9  TAZ 18011:1008 -86.27151 39.99040  4.589226  1.2212674 49850.92 
4440021   40  0   0
10 TAZ 18011:9215 -86.58085 40.13588 37.278521 69.6681954 24438.13 
4457794 2095 85 200

thrs<-seq(1000,10000,length=50)
ED<-D[,383]/D[,5] # employment density
CBDx<-D[,6]-58277.194363 # convert a coordinate for x
CBDy<-D[,7]-4414486.03135 # convert a coordinate for y

AER<-vector("numeric",length(thrs))
OER<-vector("numeric",length(thrs))
MER<-vector("numeric",length(thrs))

# compute the apparent error rates for each threshold value
for (j in 1:length(thrs)){
ctgy<-ifelse(ED>thrs[j],2,1) # 2 categories are created by the threshold
test1<-qda(cbind(ED,CBDx,CBDy),ctgy)
est1<-cbind(ctgy,predict(test1)$class)
AER[j]<-sum((est1[,1]-est1[2])==0)/dim(D)[1]
}

# OER computation for ith location taken out for the thresholds
for (k in 1:dim(D)[1]){
for (j in 1:length(thrs)){
ctgy<-ifelse(ED>thrs[j],2,1)
test2<-qda(cbind(ED[-k],CBDx[-k],CBDy[-k]),ctgy[-k])
est2<-cbind(ctgy[-k],predict(test2)$class)
OER[j]<-mean(sum((est2[,1]-est2[2])==0)/(dim(D)[1]-1))
}}