Processing math: 100%

studyStrap Package

The studyStrap package implements multi-Study Learning algorithms such as Merging, Study-Specific Ensembling (Trained-on-Observed-Studies Ensemble), the Study Strap, and the Covariate-Matched Study Strap. It calculates and applies Covariate Profile Similarity and Stacking weights. By training models within the caret ecosystem, this package can flexibly apply different methods (e.g., random forests, linear regression, neural networks) as single-study learners within the multi-Study ensembling framework. The package allows for multiple single-study learners per study as well as custom functions for Covariate Profile Similarity weighting and for the accept/reject step utilized in the Covariate-Matched Study Strap. The prediction function allows use of this framework without having to manually ensemble and weight model predictions.


Below we offer a few basic examples using the core functions of the package. We begin by simulating a multi-study prediction setting.

Generate data and import packages

set.seed(1)
library(studyStrap)
# create half of training dataset from 1 distribution
X1 <- matrix(rnorm(2000), ncol = 2) # design matrix - 2 covariates
B1 <- c(5, 10, 15) # true beta coefficients
y1 <- cbind(1, X1) %*% B1

# create 2nd half of training dataset from another distribution
X2 <- matrix(rnorm(2000, 1,2), ncol = 2) # design matrix - 2 covariates
B2 <- c(10, 5, 0) # true beta coefficients
y2 <- cbind(1, X2) %*% B2

X <- rbind(X1, X2)
y <- c(y1, y2)

study <- sample.int(10, 2000, replace = TRUE) # 10 studies
data <- data.frame( Study = study, Y = y, V1 = X[,1], V2 = X[,2] )


# create target study design matrix for covariate profile similarity weighting and 
# accept/reject algorithm (Covariate-matched study strap)
target <- matrix(rnorm(1000, 3, 5), ncol = 2) # design matrix
colnames(target) <- c("V1", "V2")

Structure of Data

We have 10 studies (combined into a single dataframe), each with an outcome vector Y and two covariates V1 and V2.

head(data)
##   Study          Y         V1          V2
## 1     6  15.759938 -0.6264538  1.13496509
## 2     1  23.515411  0.1836433  1.11193185
## 3    10 -16.417951 -0.8356286 -0.87077763
## 4     6  24.113782  1.5952808  0.21073159
## 5     7   9.336012  0.3295078  0.06939565
## 6     6 -28.144417 -0.8204684 -1.66264885

Study-Specific Ensemble (Trained-on-Observed-Studies Ensemble)

We begin with the basic ensembling setting (the Study-Specific Ensemble or Trained-on-Observed-Studies Ensemble) where we train one or more models on each study and then ensemble the models.

Study-Specific Ensemble with 1 SSL: Principal Component Regression

Here we just use one single-study learner: PCR. We assume one has tuned the model to their liking and specifies the tuning parameters as they would in caret. Here we show an example of a custom function used for Covariate Profile Similarity weighting but we point out that this is not necessary.


Moreover, we specify a target study to allow for Covariate Profile Similarity weighting. This is unnecessary and we show an example without this below.

# custom function
fn1 <- function(x1,x2){
    return( abs( cor( colMeans(x1), colMeans(x2) )) )
    } 

sseMod1 <- sse(formula = Y ~., 
               data = data, 
               target.study = target,
               ssl.method = list("pcr"), 
               ssl.tuneGrid = list(data.frame("ncomp" = 1)), 
               customFNs = list(fn1) )

Make predictions with Study-Specific Ensemble (Trained-on-Observed-Studies Ensemble)

preds <- studyStrap.predict(sseMod1, target)
head(preds)[1:3,]
FALSE             Avg standard_Stacking customFn_1
FALSE [1,]  0.1774518        -13.653802  0.1774518
FALSE [2,]  5.9290711         -4.472652  5.9290711
FALSE [3,] 30.8083742         35.331259 30.8083742

The predictions are a matrix here since we have default Covariate Profile Similarity measures, stacking weights and the custom weighting function we used. Notice that the custom weights are identical to those of the “Mean Corr” weights by design. The first column is a simple average of the predictions from all of the models.

Study-Specific Ensemble (Trained-on-Observed-Studies Ensemble) with 2 SSLs

As above, we run the same algorithm but for each study, we now train a model on both linear regression and PCR.

# custom function
fn1 <- function(x1,x2){
    return( abs( cor( colMeans(x1), colMeans(x2) )) )
    } 

sseMod2 <- sse(formula = Y ~., 
               data = data, 
               target.study = target,
               ssl.method = list("lm","pcr"), 
               ssl.tuneGrid = list(NA, data.frame("ncomp" = 2)), 
               customFNs = list(fn1) )

Make predictions with Study-Specific Ensemble (Trained-on-Observed-Studies Ensemble) with 2 SSLs

Making predictions is identical and produces output with identical structure. The function will automatically account for the fact that each study has a model trained on linear regression and a model trained on PCR. Covariate Profile Similarity weights account for these by weighting equally two models trained on the same data.

preds <- studyStrap.predict(sseMod2, target)
head(preds)[1:3,]
FALSE             Avg standard_Stacking customFn_1
FALSE [1,] -13.995010        -14.066971 -13.995010
FALSE [2,]  -4.730832         -4.792895  -4.730832
FALSE [3,]  35.433599         35.415095  35.433599


Now let us assume we do not have a target study to generate Covariate Profile Similarity weights.


sseMod3 <- sse(formula = Y ~., 
               data = data,
               ssl.method = list("pcr"), 
               ssl.tuneGrid = list(NA, data.frame("ncomp" = 1)), 
               sim.mets = FALSE)

preds <- studyStrap.predict(sseMod3, target)
head(preds)[1:3,]
FALSE             Avg standard_Stacking
FALSE [1,]  0.1774518        -13.653802
FALSE [2,]  5.9290711         -4.472652
FALSE [3,] 30.8083742         35.331259


Since we do not have a target study we cannot generate Covariate Profile Similarity weights and predictions are only for stacking and simple averaging.


Now let us move on to another standard multi-study learning method, Merging:

Merged Approach

Merged with 1 SSL and 2 SSLs

# 1 SSL
mrgMod1 <- merged(formula = Y ~.,
                  data = data,   
                  ssl.method = list("pcr"), 
                  ssl.tuneGrid = list( data.frame("ncomp" = 2))
                  )

# 2 SSLs
mrgMod2 <- merged(formula = Y ~.,
                  data = data,  
                  ssl.method = list("lm","pcr"), 
                  ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))
                  )


Predictions only produce 1 vector of predictions listed under Avg.

preds <- studyStrap.predict(mrgMod2, target)
head(preds)
FALSE             Avg NA_Stacking
FALSE [1,] -14.066971          NA
FALSE [2,]  -4.792895          NA
FALSE [3,]  35.415095          NA
FALSE [4,]  53.204946          NA
FALSE [5,]  60.384128          NA
FALSE [6,]  29.725637          NA

Study Strap

We now demonstrate the use of the Study Strap with 10 straps and all available weighting schemes.

Study Strap with 1 and 2 SSLs

# custom function
fn1 <- function(x1,x2){
    return( abs( cor( colMeans(x1), colMeans(x2) )) )
    } 

# 1 SSL
ssMod1 <- ss(formula = Y ~.,
             data = data,  
             target.study = target,
             bag.size = length(unique(data$Study)), 
             straps = 10, 
             stack = "standard",
             sim.covs = NA, 
             ssl.method = list("pcr"), 
             ssl.tuneGrid = list(data.frame("ncomp" = 2)), 
             sim.mets = TRUE,
             model = TRUE, 
             customFNs = list( fn1 ) )

# 2 SSLs
ssMod2 <- ss(formula = Y ~., 
             data = data, 
             target.study = target,
             bag.size = length(unique(data$Study)), 
             straps = 10, 
             stack = "standard",
             sim.covs = NA, 
             ssl.method = list("lm","pcr"), 
             ssl.tuneGrid = list(NA, data.frame("ncomp" = 2)), 
             sim.mets = TRUE,
             model = TRUE, 
             customFNs = list( fn1 ) )


Predictions have the same structure as the Study-Specific Ensemble.


preds <- studyStrap.predict(ssMod2, target)
head(preds)[1:3,]
FALSE             Avg standard_Stacking Matcor Diag Matcor Sum Matcor Sum Abs
FALSE [1,] -14.627272        -14.066971  -14.284018 -14.452172      -14.33067
FALSE [2,]  -5.138826         -4.792895   -4.828864  -4.823478       -4.86358
FALSE [3,]  35.998061         35.415095   36.161704  36.918680       36.17842
FALSE           |rho|     rho sq  UV rho sq  UV cov sq     UV rho     UV cov
FALSE [1,] -14.211107 -14.099586 -14.740194 -14.822641 -14.956572 -15.330968
FALSE [2,]  -4.776769  -4.663318  -5.269537  -5.334426  -5.438017  -5.734821
FALSE [3,]  36.123186  36.243555  35.789701  35.801652  35.830304  35.874078
FALSE      diag UV rho sq diag UV cov diag UV cov sq  Mean Corr        SMI         RV
FALSE [1,]     -14.740194  -14.828585     -14.638752 -14.627272 -14.112789 -14.197656
FALSE [2,]      -5.269537   -5.320772      -5.189034  -5.138826  -4.673056  -4.735643
FALSE [3,]      35.789701   35.900237      35.777924  35.998061  36.248897  36.283369
FALSE             RV2      RVadj        PSI         r1         r2         r3
FALSE [1,] -15.360730 -15.353452 -14.239038 -13.350311 -13.117915 -13.273435
FALSE [2,]  -5.662882  -5.660354  -4.799337  -3.997512  -3.938409  -3.938728
FALSE [3,]  36.383972  36.366180  36.124056  36.543014  35.852250  36.522881
FALSE              r4        GCD customFn_1
FALSE [1,] -13.018795 -14.112789 -14.627272
FALSE [2,]  -3.861005  -4.673056  -5.138826
FALSE [3,]  35.834974  36.248897  35.998061


Now let’s say we do not want to use the custom similarity measures. We can turn these off and this will significantly improve the time it takes to fit the models and will alter the structure of the prediction output. We must specify the bag size. The default is to use the number of training studies, but this must be tuned for optimal performance.

# custom function
fn1 <- function(x1,x2){
    return( abs( cor( colMeans(x1), colMeans(x2) )) )
    } 

ssMod3 <- ss(formula = Y ~., 
             data = data, 
             target.study = target,
             bag.size = length(unique(data$Study)), 
             straps = 10, 
             sim.covs = NA, ssl.method = list("pcr"), 
             ssl.tuneGrid = list(data.frame("ncomp" = 2)), 
             sim.mets = FALSE,
             customFNs = list( fn1 ) )

preds <- studyStrap.predict(ssMod3, target)
head(preds)[1:3,]
FALSE            Avg standard_Stacking customFn_1
FALSE [1,] -12.97376        -14.066971  -12.97376
FALSE [2,]  -4.03119         -4.792895   -4.03119
FALSE [3,]  34.73606         35.415095   34.73606


Now, let’s deal with the case when we do not have a target study at all. We can simply remove this argument and our predictions will be limited to a simple average and stacking weights.


ssMod4 <- ss(formula = Y ~., 
             data = data, 
             bag.size = length(unique(data$Study)), 
             straps = 10, 
             sim.covs = NA, ssl.method = list("pcr"), 
             ssl.tuneGrid = list(data.frame("ncomp" = 2)), 
             sim.mets = FALSE)

preds <- studyStrap.predict(ssMod4, target)
head(preds)[1:3,]
FALSE            Avg standard_Stacking
FALSE [1,] -12.27087        -14.066971
FALSE [2,]  -3.45521         -4.792895
FALSE [3,]  34.75931         35.415095

Covariate-Matched Study Strap (Accept/Reject)

Now we turn to the accept/reject algorithm. Here we must specify a target study. We need to specify the number of paths (we recommend 5) and the convergence limit (number of consecutive rejected study straps to meet convergence criteria). This depends on computational cost, but we would recommend at least 1000 and the more the better. Here we choose a low number for demonstration purposes. We could choose a custom function ( sim.fn ) for the accept/reject step or use the default of |cor(ˉx(r),ˉxtarget)|. Similarly we can provide custom functions for weighting as above. We also specify the maximum number of study straps allowed in total in case many are accepted without convergence. We recommend 50 straps per path to be safe, but this is obviously application specific and depends on the distribution of the covariates.


We could use 1 SSL or multiple SSLs as above. We need to specify the bag size as in the Study Strap algorithm. The default is to use the number of training studies, but this must be tuned for optimal performance.

Covariate-Matched Study Strap with 1 and 2 SSLs

# 1 SSL
arMod1 <-  cmss(formula = Y ~., 
                data = data, 
                target.study = target,
                converge.lim = 2,
                bag.size = length(unique(data$Study)), 
                max.straps = 50, 
                paths = 2, 
                ssl.method = list("pcr"), 
                ssl.tuneGrid = list(data.frame("ncomp" = 2))
                )

# 2 SSLs
arMod2 <-  cmss(formula = Y ~., 
                data = data, 
                target.study = target,
                converge.lim = 2,
                bag.size = length(unique(data$Study)), 
                max.straps = 50, 
                paths = 2, 
                ssl.method = list("lm","pcr"), 
                ssl.tuneGrid = list(NA, data.frame("ncomp" = 2))
                )

preds <- studyStrap.predict(arMod2, target)
head(preds)[1:3,]
FALSE             Avg standard_Stacking Matcor Diag Matcor Sum Matcor Sum Abs
FALSE [1,] -14.040566        -13.760921  -13.920651 -13.889937     -14.145602
FALSE [2,]  -4.813971         -4.555755   -4.730099  -4.683699      -4.897403
FALSE [3,]  35.192579         35.352613   35.119938  35.232755      35.203636
FALSE           |rho|     rho sq  UV rho sq  UV cov sq     UV rho     UV cov
FALSE [1,] -14.169412 -14.249983 -14.442813 -14.443724 -14.407886 -13.747953
FALSE [2,]  -4.914306  -4.980559  -5.146644  -5.147101  -5.113404  -4.595961
FALSE [3,]  35.216743  35.213340  35.165515  35.167016  35.190825  35.085728
FALSE      diag UV rho sq diag UV cov diag UV cov sq  Mean Corr        SMI         RV
FALSE [1,]     -14.442813  -13.719055     -14.001783 -14.040566 -14.253161 -14.245844
FALSE [2,]      -5.146644   -4.615958      -4.836169  -4.813971  -4.984143  -4.974931
FALSE [3,]      35.165515   34.856149      34.908938  35.192579  35.208081  35.225245
FALSE             RV2      RVadj        PSI         r1         r2         r3
FALSE [1,] -14.882410 -14.652337 -14.160104 -13.704759 -13.743827 -13.709580
FALSE [2,]  -5.425419  -5.242989  -4.905941  -4.572621  -4.592698  -4.573953
FALSE [3,]  35.583131  35.557210  35.220889  35.023320  35.085220  35.037006
FALSE              r4        GCD
FALSE [1,] -13.744843 -14.253161
FALSE [2,]  -4.591946  -4.984143
FALSE [3,]  35.093549  35.208081


Now let us use the accept/reject step based upon our own custom function (sim.fn). We turn off the default Covariate Profile Similarity weights to speed up runtime (sim.mets = FALSE) but provide 2 of our own custom functions for Covariate Profile Similarity weights.

Covariate-Matched Study Strap with Custom Function for Accept/Reject Step

# 1 SSL

# custom function for CPS
fn1 <- function(x1,x2){
    return( abs( cor( colMeans(x1), colMeans(x2) )) )
} 

# custom function for Accept/Reject step criteria
fn2 <- function(x1,x2){
    return( sum ( ( colMeans(x1) - colMeans(x2) )^2 ) )
    } 

arMod3 <-  cmss(formula = Y ~., 
                data = data, 
                target.study = target,
                converge.lim = 2,
                bag.size = length(unique(data$Study)), 
                max.straps = 50, 
                paths = 2, 
                ssl.method = list("pcr"), 
                ssl.tuneGrid = list(data.frame("ncomp" = 2)),
                sim.mets = FALSE,
                sim.fn = fn2,
                customFNs = list( fn1, fn2 ) 
                )

preds <- studyStrap.predict(arMod3, target)
head(preds)[1:3,]
FALSE             Avg standard_Stacking customFn_1 customFn_2
FALSE [1,] -13.963306        -14.066971 -13.963306 -14.026422
FALSE [2,]  -4.777185         -4.792895  -4.777185  -4.816016
FALSE [3,]  35.052555         35.415095  35.052555  35.118968

Model Object Structure

Now that we understand how to fit models, let us take a second to explore the model object that the package produces. The model objects are S3 classes. That is, they are functionally a list.

sseMod1
## $models
## $models[[1]]
## $models[[1]][[1]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[2]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[3]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[4]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[5]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[6]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[7]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[8]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[9]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## $models[[1]][[10]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## 
## 
## $data
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## list()
## 
## $dataInfo
## $dataInfo$studyNames
##  [1]  6  1 10  7  4  8  3  2  9  5
## 
## $dataInfo$sampleSizes
##  [1] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000
## 
## 
## $modelInfo
## $modelInfo$sampling
## [1] "studySpecificEnsemble"
## 
## $modelInfo$numStraps
## [1] 10
## 
## $modelInfo$SSL
## $modelInfo$SSL[[1]]
## [1] "pcr"
## 
## 
## $modelInfo$ssl.tuneGrid
## NULL
## 
## $modelInfo$numPaths
## [1] NA
## 
## $modelInfo$convg.vec
## NULL
## 
## $modelInfo$convgCritera
## [1] NA
## 
## $modelInfo$meanSamp
## [1] NA
## 
## $modelInfo$stack.type
## [1] "standard"
## 
## $modelInfo$custFNs
## $modelInfo$custFNs[[1]]
## function(x1,x2){
##     return( abs( cor( colMeans(x1), colMeans(x2) )) )
##     }
## <bytecode: 0x7fb8361996e0>
## 
## 
## $modelInfo$bagSize
## [1] 1
## 
## 
## [[7]]
## NULL
## 
## $simMat
##       customFn_1
##  [1,]        0.1
##  [2,]        0.1
##  [3,]        0.1
##  [4,]        0.1
##  [5,]        0.1
##  [6,]        0.1
##  [7,]        0.1
##  [8,]        0.1
##  [9,]        0.1
## [10,]        0.1
## 
## $stack.coefs
##  [1] 0.00000000 0.00000000 0.00000000 0.00000000 0.84196123 0.00000000
##  [7] 0.00000000 0.09382132 0.00000000 0.00000000 0.00000000
## 
## $strapRows
## $strapRows$length
##   [1]    1    4    6   10   22   43   49   53   55   87  100  106  108  128  140
##  [16]  149  154  176  180  190  191  203  214  250  253  259  262  273  290  308
##  [31]  315  321  329  332  344  355  368  374  396  399  409  414  415  419  429
##  [46]  433  443  444  464  466  473  485  491  492  495  503  534  558  561  598
##  [61]  605  619  633  642  646  650  656  667  671  684  697  713  740  752  753
##  [76]  776  791  810  811  825  842  845  848  857  863  866  921  936  948  974
##  [91]  979  993  996 1014 1023 1038 1041 1048 1052 1064 1065 1067 1080 1086 1097
## [106] 1120 1130 1137 1159 1193 1199 1208 1214 1220 1225 1232 1234 1242 1250 1253
## [121] 1275 1286 1304 1305 1358 1360 1362 1368 1378 1406 1428 1430 1433 1453 1469
## [136] 1488 1508 1509 1519 1537 1539 1553 1573 1582 1587 1593 1600 1602 1625 1643
## [151] 1649 1651 1685 1696 1707 1720 1724 1737 1756 1779 1818 1836 1878 1887 1888
## [166] 1913 1917 1919 1934 1946 1956 1969 1974 1975 1976 1978
## 
## $strapRows[[2]]
##   [1]    2   13   21   29   33   34   44   47   52   57   59   61   76  104  109
##  [16]  118  121  123  135  137  158  179  200  217  233  237  256  274  279  280
##  [31]  288  314  317  324  328  333  349  354  366  370  375  376  389  397  398
##  [46]  400  436  452  467  493  508  516  520  523  535  549  554  567  584  586
##  [61]  592  595  604  625  661  665  685  687  692  696  701  703  722  734  745
##  [76]  748  757  764  768  778  781  789  815  818  833  838  840  876  888  889
##  [91]  911  916  917  919  930  931  951  953  957  964  969  981  985  999 1000
## [106] 1007 1021 1037 1049 1070 1071 1077 1091 1092 1096 1099 1100 1105 1124 1125
## [121] 1148 1162 1165 1167 1170 1182 1186 1198 1200 1210 1222 1244 1247 1251 1254
## [136] 1264 1280 1326 1331 1347 1353 1359 1366 1367 1372 1376 1391 1398 1400 1405
## [151] 1414 1422 1429 1441 1449 1451 1459 1474 1483 1484 1493 1498 1500 1501 1504
## [166] 1510 1533 1556 1561 1571 1576 1577 1588 1603 1627 1641 1642 1670 1671 1682
## [181] 1683 1689 1691 1706 1709 1712 1715 1717 1722 1740 1759 1762 1787 1792 1809
## [196] 1813 1821 1826 1835 1843 1845 1846 1852 1855 1866 1869 1891 1898 1900 1905
## [211] 1943 1945 1948 1953 1967 1968 1998 1999
## 
## $strapRows[[3]]
##   [1]    3   11   16   18   35   51   71   75   92   95   99  116  117  127  133
##  [16]  165  168  170  178  215  236  245  248  254  272  316  320  323  331  360
##  [31]  363  379  388  431  437  446  454  461  471  483  515  518  521  530  551
##  [46]  559  560  563  575  577  581  589  623  629  635  652  653  664  678  682
##  [61]  702  712  731  732  733  756  761  779  828  852  855  868  874  904  906
##  [76]  908  925  926  937  939  942  965  991  994 1001 1002 1024 1028 1029 1030
##  [91] 1036 1047 1055 1076 1082 1085 1088 1108 1112 1134 1142 1161 1180 1212 1213
## [106] 1233 1241 1246 1252 1255 1262 1283 1292 1332 1343 1351 1357 1375 1384 1403
## [121] 1418 1423 1432 1447 1456 1495 1514 1518 1522 1525 1527 1530 1542 1549 1552
## [136] 1575 1580 1585 1586 1598 1610 1619 1621 1628 1653 1657 1666 1693 1708 1728
## [151] 1738 1751 1757 1764 1766 1767 1775 1776 1777 1780 1782 1786 1808 1825 1830
## [166] 1834 1838 1851 1854 1858 1868 1870 1871 1880 1894 1901 1910 1918 1922 1923
## [181] 1925 1933 1937 1942 1957 1960 1965 1979 1980 1991
## 
## $strapRows[[4]]
##   [1]    5   41   42   62   67   68   74  102  105  111  113  120  124  148  157
##  [16]  160  166  171  184  185  195  207  209  211  213  230  232  234  252  255
##  [31]  265  289  291  306  325  338  345  353  356  382  387  391  392  394  402
##  [46]  405  418  430  432  435  440  445  449  450  469  472  476  477  480  489
##  [61]  538  540  541  544  553  569  585  639  647  648  663  666  670  672  673
##  [76]  695  708  715  718  723  728  738  741  762  772  775  777  783  792  794
##  [91]  795  797  800  803  808  823  824  841  860  865  872  885  890  894  899
## [106]  905  910  912  918  924  933  945  977  984 1013 1015 1017 1019 1022 1032
## [121] 1033 1035 1039 1051 1059 1069 1079 1087 1113 1135 1143 1145 1158 1189 1201
## [136] 1205 1209 1219 1223 1231 1235 1267 1268 1272 1306 1307 1316 1320 1325 1330
## [151] 1335 1365 1385 1402 1411 1412 1439 1448 1452 1457 1468 1478 1487 1489 1497
## [166] 1499 1505 1507 1538 1543 1566 1569 1570 1599 1601 1611 1624 1631 1634 1636
## [181] 1645 1648 1654 1655 1660 1672 1694 1700 1704 1729 1731 1749 1771 1781 1785
## [196] 1804 1806 1814 1817 1837 1847 1856 1862 1865 1875 1882 1885 1897 1899 1911
## [211] 1926 1931 1947 1950 1951 1963 1977 1981 1986 1994 2000
## 
## $strapRows[[5]]
##   [1]    7   23   26   31   50   54   66   72   73   83   90   91   93   98  115
##  [16]  136  142  159  162  163  169  187  198  205  218  226  227  239  257  261
##  [31]  263  266  281  285  301  327  336  343  347  352  365  377  381  383  393
##  [46]  406  412  413  427  434  439  478  484  488  497  502  504  510  525  533
##  [61]  555  571  579  580  583  596  597  601  607  608  614  618  624  626  627
##  [76]  630  640  660  674  679  691  693  694  707  709  714  729  739  742  750
##  [91]  755  769  774  782  784  786  788  796  831  846  850  877  882  896  900
## [106]  909  920  922  932  941  943  950  954  961  972  992 1016 1026 1034 1040
## [121] 1043 1058 1068 1101 1109 1110 1117 1121 1131 1147 1173 1175 1185 1204 1207
## [136] 1211 1216 1227 1228 1257 1273 1295 1296 1301 1311 1312 1322 1338 1381 1382
## [151] 1386 1394 1396 1401 1413 1417 1435 1467 1475 1476 1494 1502 1512 1520 1521
## [166] 1528 1534 1544 1545 1546 1594 1617 1626 1632 1635 1647 1650 1652 1658 1687
## [181] 1697 1711 1714 1719 1733 1746 1753 1774 1788 1791 1795 1805 1857 1859 1883
## [196] 1912 1914 1915 1916 1924 1927 1938 1939 1966 1970 1972 1983 1984 1996
## 
## $strapRows[[6]]
##   [1]    8   12   40   65   69   89   96  119  131  161  173  193  199  223  224
##  [16]  225  228  240  244  283  293  297  300  304  311  348  373  384  395  403
##  [31]  428  438  441  442  451  456  459  462  470  475  499  511  528  536  543
##  [46]  548  556  557  582  599  600  606  609  610  613  655  669  686  705  743
##  [61]  758  759  780  787  802  805  809  812  814  820  834  839  843  851  853
##  [76]  856  870  887  895  914  923  927  928  947  952  968  975  976  990  997
##  [91] 1003 1004 1009 1020 1027 1031 1042 1054 1074 1083 1093 1094 1104 1166 1169
## [106] 1172 1179 1183 1188 1192 1243 1245 1256 1259 1261 1266 1269 1271 1278 1288
## [121] 1294 1297 1302 1308 1310 1323 1324 1349 1352 1355 1369 1374 1380 1388 1389
## [136] 1392 1409 1416 1420 1424 1426 1437 1443 1444 1446 1479 1513 1515 1532 1540
## [151] 1541 1554 1558 1560 1574 1589 1604 1608 1612 1620 1623 1638 1639 1640 1644
## [166] 1662 1676 1681 1692 1702 1703 1705 1725 1734 1736 1739 1754 1755 1760 1769
## [181] 1810 1816 1819 1822 1831 1839 1853 1861 1867 1874 1892 1902 1909 1940 1941
## [196] 1955 1971 1995
## 
## $strapRows[[7]]
##   [1]    9   27   28   45   46   56   82   86  103  126  129  138  147  156  164
##  [16]  172  189  202  206  208  210  220  221  229  235  258  268  270  275  276
##  [31]  307  310  313  318  335  337  341  390  404  407  424  425  460  463  481
##  [46]  490  501  522  524  532  539  542  588  591  615  616  621  628  631  662
##  [61]  706  716  717  746  766  771  785  790  798  804  827  847  849  858  873
##  [76]  875  884  902  944  946  960  967  995  998 1006 1018 1025 1046 1050 1063
##  [91] 1098 1102 1114 1122 1127 1129 1139 1141 1149 1153 1155 1156 1160 1163 1184
## [106] 1190 1191 1195 1218 1226 1239 1240 1263 1265 1274 1276 1285 1298 1303 1313
## [121] 1328 1333 1336 1337 1340 1345 1346 1356 1371 1373 1379 1390 1395 1404 1434
## [136] 1442 1460 1461 1463 1466 1472 1473 1481 1503 1516 1517 1548 1564 1567 1578
## [151] 1579 1595 1606 1607 1618 1622 1675 1679 1695 1713 1718 1721 1723 1726 1745
## [166] 1750 1758 1793 1820 1829 1840 1842 1848 1864 1872 1873 1877 1881 1893 1895
## [181] 1921 1928 1932 1952 1987 1989 1993 1997
## 
## $strapRows[[8]]
##   [1]   14   24   36   48   60   70   79   80   85   88   97  101  112  122  146
##  [16]  174  181  182  183  192  194  196  222  238  251  271  277  278  286  292
##  [31]  298  299  305  350  357  358  359  367  372  378  380  411  417  420  423
##  [46]  448  482  486  487  527  531  547  550  552  572  573  593  603  612  620
##  [61]  632  641  644  651  654  658  668  675  677  681  683  688  689  720  724
##  [76]  727  736  754  760  767  770  806  816  821  826  832  862  869  883  901
##  [91]  903  929  934  935  955  971  978  980  982  988 1010 1011 1044 1045 1056
## [106] 1073 1081 1089 1090 1107 1111 1115 1118 1119 1126 1128 1132 1151 1152 1168
## [121] 1177 1202 1203 1279 1281 1289 1291 1293 1300 1309 1319 1327 1329 1341 1348
## [136] 1350 1354 1364 1399 1415 1419 1421 1440 1458 1471 1482 1485 1486 1490 1491
## [151] 1511 1531 1535 1547 1551 1559 1565 1581 1592 1616 1630 1633 1659 1667 1668
## [166] 1669 1684 1698 1744 1761 1790 1815 1827 1860 1886 1889 1908 1935 1954 1959
## [181] 1962 1982 1990 1992
## 
## $strapRows[[9]]
##   [1]   15   17   19   20   30   37   58   63   64   77   78   81   94  107  114
##  [16]  130  141  143  144  145  150  151  152  153  167  177  197  201  204  219
##  [31]  231  243  247  249  260  282  287  294  295  296  302  303  309  312  319
##  [46]  326  330  342  346  362  371  401  410  416  426  447  465  474  479  494
##  [61]  498  500  505  506  507  509  512  517  519  526  529  537  546  562  564
##  [76]  565  568  587  590  594  602  617  622  637  643  645  657  676  680  690
##  [91]  700  704  711  725  726  730  735  737  744  747  763  765  773  793  801
## [106]  819  822  829  830  836  837  844  854  861  867  871  879  893  897  949
## [121]  958  959  970  973  986  987 1005 1008 1053 1066 1078 1103 1116 1133 1144
## [136] 1146 1174 1176 1181 1196 1206 1215 1229 1230 1260 1277 1290 1299 1315 1317
## [151] 1318 1321 1334 1339 1361 1383 1436 1445 1454 1464 1465 1477 1492 1496 1506
## [166] 1524 1529 1550 1562 1563 1583 1584 1591 1614 1615 1646 1664 1673 1678 1680
## [181] 1701 1710 1727 1735 1741 1743 1747 1763 1765 1768 1772 1784 1797 1798 1800
## [196] 1803 1807 1812 1824 1833 1841 1844 1863 1879 1890 1896 1904 1906 1907 1936
## [211] 1944 1949 1958 1964
## 
## $strapRows[[10]]
##   [1]   25   32   38   39   84  110  125  132  134  139  155  175  186  188  212
##  [16]  216  241  242  246  264  267  269  284  322  334  339  340  351  361  364
##  [31]  369  385  386  408  421  422  453  455  457  458  468  496  513  514  545
##  [46]  566  570  574  576  578  611  634  636  638  649  659  698  699  710  719
##  [61]  721  749  751  799  807  813  817  835  859  864  878  880  881  886  891
##  [76]  892  898  907  913  915  938  940  956  962  963  966  983  989 1012 1057
##  [91] 1060 1061 1062 1072 1075 1084 1095 1106 1123 1136 1138 1140 1150 1154 1157
## [106] 1164 1171 1178 1187 1194 1197 1217 1221 1224 1236 1237 1238 1248 1249 1258
## [121] 1270 1282 1284 1287 1314 1342 1344 1363 1370 1377 1387 1393 1397 1407 1408
## [136] 1410 1425 1427 1431 1438 1450 1455 1462 1470 1480 1523 1526 1536 1555 1557
## [151] 1568 1572 1590 1596 1597 1605 1609 1613 1629 1637 1656 1661 1663 1665 1674
## [166] 1677 1686 1688 1690 1699 1716 1730 1732 1742 1748 1752 1770 1773 1778 1783
## [181] 1789 1794 1796 1799 1801 1802 1811 1823 1828 1832 1849 1850 1876 1884 1903
## [196] 1920 1929 1930 1961 1973 1985 1988
## 
## 
## attr(,"class")
## [1] "ss"

Let us begin by exploring how the models are stored.

Models

sseMod1$models
## [[1]]
## [[1]][[1]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[2]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[3]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[4]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[5]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[6]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[7]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[8]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[9]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None 
## 
## [[1]][[10]]
## Principal Component Analysis 
## 
## No pre-processing
## Resampling: None

Models are organized as a list of lists. Each element in the primary list is itself a list of all the models trained on one single study learner (e.g., lm, random forests). Each element in that list is a model trained on a study/study strap. Here we have only one single study learner (PCR), so the list is of length 10.

Model Info

Model Info provides information about how the models were fit. These are stored based upon user input when fitting the model.

names(sseMod1$modelInfo)
##  [1] "sampling"     "numStraps"    "SSL"          "ssl.tuneGrid" "numPaths"    
##  [6] "convg.vec"    "convgCritera" "meanSamp"     "stack.type"   "custFNs"     
## [11] "bagSize"

Data Info

Data Info provides information about the raw data that was fed to the model fitting functions. Original data is stored if “model = TRUE” is specified.

names(sseMod1$dataInfo)
## [1] "studyNames"  "sampleSizes"

Similarity Matrix

simMat provides the similarity matrix that is used for Covariate Profile Similarity weights.