[R-SIG-Finance] Trying to get earth models to (better) match those from other tools

Mark Knecht markknecht at gmail.com
Tue Apr 16 00:42:32 CEST 2013


Hi,
   Does anyone have experience comparing the MARS models created in R
using the earth package to those created in commercially available
tools? I'm wondering about settings I might need to tweak to more
closely match between the two.

   I'm currently attempting to build a model of a signal used to trade
GS using a neural network program. The target (last column in the
data) is an 'ideal' signal for the neural network that produces
excellent returns. It's mathematically generated after the fact and
cannot be used for real trading. The inputs are standard indicators.
My models in both R & Salford are using the same data set, 100%
in-sample for now, and while the models are sometimes similar they
don't ever match. There's code below that you can run yourself if
you'd like to duplicate at the R side of what I'm posting here. Here
are two very simple models exhibiting a typical difference:

>From R:
TargetData =
  1.4277855
  +  2.7607779 * pmax(0,      ATR.30. -        5.003)
  -  1.5817456 * pmax(0,        5.003 -      ATR.30.)
  + 0.47233563 * pmax(0, SDev..C..10. -     4.536255)
  -  1.0922087 * pmax(0, SDev..C..30. -     6.659525)

>From Salford SPM:
 BF1 = max(0, SDEV__C__30_ - 7.7014);
 BF3 = max(0, PDIFFEMA__C__100_ + 0.124316);
 BF5 = max(0, 4.97535 - SDEV__C__20_);
 BF7 = max(0, 0.005878 - PDIFFEMA__C__2_);

 Y = 0.0886771 - 1.13545 * BF1 + 10.4891 * BF3 + 0.677875 * BF5
               + 16.2416 * BF7

   As you can see the two environments chose very different indicators
and came up with very different results. So far in most (approx. 75%)
of the cases I've run I find the Salford model has lower residuals at
least compared with my rather simple code of the type below, but I bet
there's better settings for the earth package than what I'm doing
here.

   In the end I don't think it's critically important that R exactly
equals what Salford's tool puts out, but I'd like to better understand
at what's driving the differences if I can. I hope someone with more
experience in this area might share some pointers.

Thanks in advance,
Mark



library(earth)

Test_Data = structure(list(ATR.200. = c(5.3593, 5.38645, 5.38465, 5.39745,
5.4037, 5.41655, 5.3925, 5.40095, 5.4075, 5.41, 5.43405, 5.4421,
5.4421, 5.4615, 5.47085, 5.4571, 5.4549, 5.4062, 5.4084, 5.4446,
5.45275, 5.459, 5.41465, 5.3884, 5.3553, 5.3612, 5.35245, 5.3256,
5.30565, 5.2844), ATR.30. = c(4.728667, 4.709667, 4.666333, 4.635,
4.672667, 4.620333, 4.497667, 4.570667, 4.566333, 4.654, 4.808,
4.853667, 4.957667, 5.057667, 5.071667, 4.973667, 5.003, 4.899,
5.063667, 5.453, 5.463333, 5.525667, 5.536333, 5.624, 5.765667,
5.794667, 5.773667, 5.742333, 5.805, 5.807), ATR.50. = c(5.5326,
5.5312, 5.4926, 5.465, 5.4488, 5.4586, 5.4524, 5.496, 5.4372,
5.3484, 5.4258, 5.4182, 5.3544, 5.2906, 5.2404, 5.148, 5.1268,
4.977, 5.037, 5.2382, 5.2758, 5.2658, 5.2636, 5.225, 5.2814,
5.2574, 5.155, 5.1488, 5.1864, 5.194), PDiffEMA..C..10. = c(0.054091,
0.034275, 0.009988, 0.064954, 0.090363, 0.040484, 0.047534, -0.021532,
-0.024618, -0.02721, 0.009576, 0.013524, 0.045701, 0.08149, 0.091634,
0.074339, 0.103592, 0.081728, 0.011472, 0.131327, 0.08382, 0.030076,
0.012852, -0.002183, 0.027274, 0.037493, 0.017167, 0.023488,
0.019136, -0.003297), PDiffEMA..C..100. = c(-0.099275, -0.10749,
-0.124316, -0.061964, -0.019515, -0.054829, -0.037535, -0.103405,
-0.109136, -0.114795, -0.077913, -0.0702, -0.030246, 0.021007,
0.050924, 0.050572, 0.102323, 0.098286, 0.028976, 0.181165, 0.149558,
0.097734, 0.080706, 0.062797, 0.09865, 0.116274, 0.096454, 0.106667,
0.104335, 0.077546), PDiffEMA..C..15. = c(0.040617, 0.025189,
0.002923, 0.063291, 0.09577, 0.047913, 0.057517, -0.014829, -0.020429,
-0.025383, 0.011905, 0.01652, 0.051724, 0.093053, 0.108744, 0.094524,
0.129525, 0.10992, 0.035246, 0.164669, 0.117805, 0.060309, 0.039645,
0.020689, 0.04972, 0.059919, 0.037539, 0.043015, 0.037437, 0.012106
), PDiffEMA..C..150. = c(-0.093172, -0.10216, -0.11988, -0.057662,
-0.015205, -0.051074, -0.034004, -0.100777, -0.107207, -0.113581,
-0.077145, -0.069872, -0.030105, 0.021298, 0.051577, 0.051572,
0.104112, 0.100766, 0.031468, 0.185417, 0.154788, 0.103376, 0.086768,
0.069123, 0.105824, 0.124328, 0.104963, 0.115925, 0.114214, 0.08761
), PDiffEMA..C..2. = c(0.026596, 0.00496, -0.005546, 0.020439,
0.021133, -0.005641, 0.003898, -0.023289, -0.010783, -0.006531,
0.010838, 0.00588, 0.015649, 0.022227, 0.017145, 0.005878, 0.018494,
0.005532, -0.019885, 0.039654, 0.004918, -0.013167, -0.009126,
-0.008236, 0.008927, 0.00901, -0.002334, 0.003026, 0.001007,
-0.007354), PDiffEMA..C..20. = c(0.023494, 0.01081, -0.009719,
0.053501, 0.09028, 0.044888, 0.056839, -0.015898, -0.022072,
-0.027705, 0.010135, 0.015484, 0.052611, 0.097353, 0.116772,
0.105168, 0.144353, 0.127141, 0.050961, 0.186998, 0.14148, 0.082672,
0.060787, 0.040121, 0.069478, 0.080018, 0.056583, 0.061777, 0.055567,
0.028484), PDiffEMA..C..200. = c(-0.068742, -0.078508, -0.097264,
-0.033866, 0.009361, -0.027797, -0.010654, -0.079549, -0.08665,
-0.093699, -0.056883, -0.049865, -0.009547, 0.042795, 0.07366,
0.073603, 0.127377, 0.124079, 0.053195, 0.210835, 0.179869, 0.127461,
0.110549, 0.092516, 0.130148, 0.149253, 0.129579, 0.140945, 0.139347,
0.112197), SDev..C..10. = c(2.45958, 2.926513, 2.845093, 3.756374,
5.10219, 5.081868, 5.265783, 4.826626, 4.291157, 3.6295, 3.616951,
3.548322, 3.416113, 4.265086, 5.237565, 6.098094, 7.574411, 8.106913,
7.438958, 8.134102, 7.922738, 6.598096, 5.383846, 4.80761, 4.536255,
4.222188, 4.213449, 4.221269, 3.064969, 2.264761), SDev..C..100. = c(13.805593,
13.920931, 14.058221, 14.112652, 14.14233, 14.192877, 14.235045,
14.342703, 14.456662, 14.576194, 14.652769, 14.716173, 14.741869,
14.741427, 14.739052, 14.739403, 14.741696, 14.72701, 14.69742,
14.715059, 14.70022, 14.683246, 14.596897, 14.496483, 14.430424,
14.416019, 14.345137, 14.295151, 14.236519, 14.14409), SDev..C..15. =
c(3.810122,
3.727714, 3.420287, 3.488475, 4.557784, 4.940561, 5.411468, 5.152154,
4.963443, 4.826029, 4.495605, 4.327133, 4.217027, 4.473102, 4.804435,
5.44519, 6.496722, 6.980111, 7.051635, 8.743879, 9.554652, 9.794502,
9.39961, 8.65292, 7.64887, 6.855349, 5.630696, 4.653948, 4.153742,
3.841768), SDev..C..150. = c(14.809111, 14.831458, 14.871543,
14.875049, 14.86221, 14.85398, 14.839834, 14.839124, 14.83275,
14.800584, 14.744953, 14.70903, 14.680755, 14.634937, 14.583621,
14.535156, 14.49157, 14.425511, 14.354034, 14.308416, 14.160925,
13.939813, 13.738813, 13.549968, 13.402108, 13.345101, 13.266162,
13.220572, 13.159213, 13.071028), SDev..C..20. = c(4.490554,
4.136249, 3.827253, 3.827253, 4.433983, 4.677989, 4.975351, 4.906121,
4.751858, 4.726276, 4.738339, 4.787789, 4.796663, 5.260255, 5.922471,
6.230072, 7.061402, 7.532933, 7.293521, 8.285976, 9.020732, 9.136875,
8.933207, 8.906268, 9.121289, 9.26624, 9.309381, 8.905484, 8.244527,
7.159201), SDev..C..200. = c(13.937699, 13.932049, 13.915857,
13.87102, 13.808371, 13.757481, 13.744457, 13.752084, 13.758076,
13.783514, 13.798682, 13.813838, 13.81888, 13.816158, 13.812102,
13.812203, 13.812013, 13.792836, 13.771866, 13.812937, 13.841213,
13.850953, 13.807815, 13.777951, 13.728936, 13.694142, 13.64009,
13.572522, 13.563963, 13.540619), SDev..C..30. = c(6.659525,
6.312348, 6.205628, 6.185313, 6.276642, 6.153, 5.885621, 5.692941,
5.318038, 4.77963, 4.515988, 4.380862, 4.444435, 4.856535, 5.515185,
6.06977, 6.968752, 7.701403, 7.940274, 9.207252, 9.917679, 10.233194,
10.249308, 10.195807, 10.272199, 10.266021, 10.206979, 10.081089,
9.856419, 9.433817), SDev..C..300. = c(16.797963, 16.701622,
16.613187, 16.517589, 16.42642, 16.327529, 16.224822, 16.110968,
16.013515, 15.919686, 15.832194, 15.738236, 15.647027, 15.556436,
15.464073, 15.356872, 15.267237, 15.158701, 15.029432, 14.953765,
14.873406, 14.774429, 14.694815, 14.599113, 14.495635, 14.415466,
14.375168, 14.332271, 14.278468, 14.21763), SDev..C..5. = c(2.964954,
3.467655, 3.229029, 3.743404, 3.972414, 4.012618, 3.804355, 3.185228,
4.097424, 3.676871, 3.257195, 1.740488, 3.116317, 4.908958, 5.445399,
5.221577, 4.870424, 3.613789, 3.198867, 5.955117, 5.904707, 5.869594,
5.915874, 4.5131, 3.000012, 2.17424, 2.235985, 2.256934, 0.873212,
1.219631), SDev..C..50. = c(10.343699, 10.196541, 10.16258, 9.852793,
9.464534, 8.984178, 8.524337, 8.265461, 7.711857, 7.41325, 7.18623,
7.024682, 6.807552, 6.873778, 6.882958, 6.825507, 7.090114, 7.357327,
7.350076, 7.864016, 8.263432, 8.462081, 8.672223, 8.837325, 9.106306,
9.417383, 9.621208, 9.885152, 10.115859, 10.252854), Target8 = c(0.1998,
0.81849, 1.41144, 1.41144, 1.41144, 1.41144, 1.028738, 1.028738,
0.435787, 0.435787, 0.435787, 0.896595, 1.799078, 1.799078, 2.083291,
2.11517, 2.11517, 2.036645, 2.036645, 1.720176, 0.389911, -0.337575,
-0.726615, -0.726615, -0.337575, -0.068188, -0.068188, 0.034502,
0.144095, 0.1998)), .Names = c("ATR.200.", "ATR.30.", "ATR.50.",
"PDiffEMA..C..10.", "PDiffEMA..C..100.", "PDiffEMA..C..15.",
"PDiffEMA..C..150.", "PDiffEMA..C..2.", "PDiffEMA..C..20.", "PDiffEMA..C..200.",
"SDev..C..10.", "SDev..C..100.", "SDev..C..15.", "SDev..C..150.",
"SDev..C..20.", "SDev..C..200.", "SDev..C..30.", "SDev..C..300.",
"SDev..C..5.", "SDev..C..50.", "Target8"), class = "data.frame",
row.names = c(NA,
-30L))

IndData    = data.frame(Test_Data[,1:ncol(Test_Data)-1])
TargetData = data.frame(Test_Data[,ncol(Test_Data)])

model_earth = earth(IndData, TargetData, nprune=5)
summary(model_earth, digits = 8, style = "pmax")



More information about the R-SIG-Finance mailing list