Tue May 12 17:10:46 CEST 2009
I am exploring neural networks (adding non-linearities) to see if I can
get more predictive power than a linear regression model I built. I am
using the function nnet and following the example of Venables and
Ripley, in Modern Applied Statistics with S, on pages 246 to 249. I have
standardized variables (z-scores) such as assets, age and tenure. I have
other variables that are binary (0 or 1). In max_acc_ownr_nwrth_n_med
for example, the variable has a value of 1 if the client's net worth is
above the median net worth and a value of 0 otherwise. These are derived
variable I created and variables that the regression algorithm has found
to be predictive. A regression on the same variables shown below gives
me an R-Square of about 0.12. I am trying to increase the predictive
power of this regression model with a neural network being careful to
avoid overfitting.
Similar to Venables and Ripley, I used the following code:
> library(nnet)
> dim(coreaff.trn.nn)
[1] 5088 8
> head(coreaff.trn.nn)
hh.iast.y WC_Total_Assets all_assets_per_hh age tenure
max_acc_ownr_liq_asts_n_med max_acc_ownr_nwrth_n_med
max_acc_ownr_ann_incm_n_med
1 3059448 -0.4692186 -0.4173532 -0.06599001 -1.04747935
0 1 0
2 4899746 3.4854334 4.0111164 -0.06599001 -0.72540200
1 1 1
3 727333 -0.2677357 -0.4177944 -0.30136473 -0.40332465
1 1 1
4 443138 -0.5295170 -0.6999646 -0.14444825 -1.04747935
0 0 0
5 484253 -0.6112205 -0.7306664 0.64013414 0.07979137
1 0 0
6 799054 0.6580506 1.1763114 0.24784295 0.07979137
0 1 1
> coreaff.nn1 <- nnet(hh.iast.y ~ WC_Total_Assets + all_assets_per_hh +
age + tenure + max_acc_ownr_liq_asts_n_med +
+ max_acc_ownr_nwrth_n_med +
max_acc_ownr_ann_incm_n_med, coreaff.trn.nn, size = 2, decay = 1e-3,
+ linout = T, skip = T, maxit = 1000, Hess = T)
# weights: 26
initial value 12893652845419998.000000
iter 10 value 6352515847944854.000000
final value 6287104424549762.000000
converged
> summary(coreaff.nn1)
a 7-2-1 network with 26 weights
options were - skip-layer connections linear output units decay=0.001
b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1
i6->h1 i7->h1
-21604.84 -2675.80 -5001.90 -1240.16 -335.44 -12462.51
-13293.80 -9032.34
b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2
i6->h2 i7->h2
210841.52 47296.92 58100.43 -13819.10 -9195.80 117088.99
131939.57 106994.47
b->o h1->o h2->o i1->o i2->o i3->o
i4->o i5->o i6->o i7->o
1115190.67 894123.33 -417269.57 89621.84 170268.12 44833.63
59585.05 112405.30 437581.05 244201.69
> sum((hh.iast.y - predict(coreaff.nn1))^2)
Error: object "hh.iast.y" not found
So I try:
> sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2)
Error: dims [product 5053] do not match the length of object [5088]
In addition: Warning message:
In coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1) :
longer object length is not a multiple of shorter object length
Doing a little debugging:
> pred <- predict(coreaff.nn1)
> dim(pred)
[1] 5053 1
> dim(coreaff.trn.nn)
[1] 5088 8
So it looks like the dimensions (number of records/cases) of the vector
pred is 5,053 and the number of records of the input dataset is 5,088.
It looks like the neural network is dropping 35 records. Does anyone
have any idea of why it would do this? It is most probably because those
35 records are "bad" data, a pretty common occurrence in the real world.
Does anyone know how I can identify the dropped records? If I can do
this I can get the dimensions of the input dataset to be 5,053 and then:
> sum((coreaff.trn.nn$hh.iast.y - predict(coreaff.nn1))^2)
would work.
A summary of my dataset is:
> summary(coreaff.trn.nn)
hh.iast.y WC_Total_Assets all_assets_per_hh age
tenure max_acc_ownr_liq_asts_n_med
Min. : 0 Min. :-6.970e-01 Min. :-8.918e-01 Min.
:-4.617e+00 Min. :-1.209e+00 Min. :0.0000
1st Qu.: 565520 1st Qu.:-5.387e-01 1st Qu.:-6.147e-01 1st
Qu.:-4.583e-01 1st Qu.:-7.254e-01 1st Qu.:0.0000
Median : 834164 Median :-3.160e-01 Median :-3.718e-01 Median :
9.093e-02 Median :-2.423e-01 Median :0.0000
Mean : 1060244 Mean : 2.948e-13 Mean : 3.204e-12 Mean
:-1.884e-11 Mean :-3.302e-12 Mean :0.4951
3rd Qu.: 1207181 3rd Qu.: 1.127e-01 3rd Qu.: 1.891e-01 3rd Qu.:
5.617e-01 3rd Qu.: 5.629e-01 3rd Qu.:1.0000
Max. :45003160 Max. : 1.332e+01 Max. : 4.011e+00 Max. :
5.818e+00 Max. : 4.267e+00 Max. :1.0000
NA's :
3.500e+01
max_acc_ownr_nwrth_n_med max_acc_ownr_ann_incm_n_med
Min. :0.0 Min. :0.0000
1st Qu.:0.0 1st Qu.:0.0000
Median :0.5 Median :0.0000
Mean :0.5 Mean :0.3634
3rd Qu.:1.0 3rd Qu.:1.0000
Max. :1.0 Max. :1.0000
Since I am writing this post, I have a few other questions.
I know I can compare 2 regression models using:
anova(model1, model2)
Will this work if one of the models is a regression model and the other
model is a neural network? I have not reached the point in building a
neural network to try this yet. If not, is there any other way I can
compare the performance of a regression model and neural network? If not
I may have to resort to programming to do this. I can probably use
predict() to get one vector for the regression model and another for the
neural network and then compare these predictions against the actual
value.
Is there any R package that can produce lift charts (ROC curves, gains
tables, etc.), K-S statistic, etc., that can be used to quantify the
performance of a predictive model (as done in database marketing)? If
so, such a package can be used to compare a regression model and a
neural network.
Another question I have is can any of the neural network packages in R
(nnet, AMORE, neural, neuralnet, or others I do not know about) do
variable selection (the way the regression methods do)? Or must I do
this manually looking at the weights and pruning the network by
eliminating weights close to zero (at all the layers in the network)?
Thanks in advance,
Jude
