[R] Optimizing the loop for large data
Takatsugu Kobayashi
tkobayas at indiana.edu
Tue Feb 27 02:56:18 CET 2007
Rusers:
I am trying to apply a quadratic discriminant function to find the best
classification outcomes.
1 is assigned to the values greater than a threshold value; and 0 otherwise.
I would like to see how the apparent error rates and the optimal error
rate change with increasing threshold values.
I have a 1000*10 data matrix: n=1000 and p=10.
Here is what I wrote so far, but seems to be inefficient. I appreciate
if someone help me out.
library(foreign)
library(MASS)
D<-read.dbf('data/Indianapolis015.dbf') # import a data
# data looks like this
LONGLAB X Y Perimeter Area X_UTM
Y_UTM F0 F1 F2
1 TAZ 18011:1000 -86.25985 39.95286 2.061630 0.1862549 50600.38
4435792 235 0 35
2 TAZ 18011:1001 -86.31030 39.97591 3.657006 0.7305006 46440.80
4438608 0 0 0
3 TAZ 18011:1002 -86.29542 39.97054 3.516089 0.6408084 47677.31
4437936 155 0 15
4 TAZ 18011:1003 -86.27574 39.97294 5.000185 1.2592142 49374.91
4438102 835 0 55
5 TAZ 18011:1004 -86.25967 39.97197 4.788531 1.1930984 50741.38
4437913 425 0 80
6 TAZ 18011:1005 -86.29245 39.98580 6.189141 1.6734483 48031.44
4439616 185 0 35
7 TAZ 18011:1006 -86.24899 39.98259 7.525633 2.0564466 51723.80
4439040 505 0 45
8 TAZ 18011:1007 -86.30974 39.99014 3.773037 0.7790234 46583.20
4440186 30 0 10
9 TAZ 18011:1008 -86.27151 39.99040 4.589226 1.2212674 49850.92
4440021 40 0 0
10 TAZ 18011:9215 -86.58085 40.13588 37.278521 69.6681954 24438.13
4457794 2095 85 200
thrs<-seq(1000,10000,length=50)
ED<-D[,383]/D[,5] # employment density
CBDx<-D[,6]-58277.194363 # convert a coordinate for x
CBDy<-D[,7]-4414486.03135 # convert a coordinate for y
AER<-vector("numeric",length(thrs))
OER<-vector("numeric",length(thrs))
MER<-vector("numeric",length(thrs))
# compute the apparent error rates for each threshold value
for (j in 1:length(thrs)){
ctgy<-ifelse(ED>thrs[j],2,1) # 2 categories are created by the threshold
test1<-qda(cbind(ED,CBDx,CBDy),ctgy)
est1<-cbind(ctgy,predict(test1)$class)
AER[j]<-sum((est1[,1]-est1[2])==0)/dim(D)[1]
}
# OER computation for ith location taken out for the thresholds
for (k in 1:dim(D)[1]){
for (j in 1:length(thrs)){
ctgy<-ifelse(ED>thrs[j],2,1)
test2<-qda(cbind(ED[-k],CBDx[-k],CBDy[-k]),ctgy[-k])
est2<-cbind(ctgy[-k],predict(test2)$class)
OER[j]<-mean(sum((est2[,1]-est2[2])==0)/(dim(D)[1]-1))
}}
More information about the R-help
mailing list