[R-sig-ME] choice of prediction error calculations for zero inflated model

Sun Mar 31 16:20:05 CEST 2019

Dimitris,

thank you, very helpful.... I took a look at your scoring_rules function and attempted to "re-write" it for zero inflated poisson based on output of pscl predict models, would you mind taking a quick look, maybe it will become useful for someone else eventually:

library(caret)
library(pscl)
data("bioChemists", package = "pscl")
split=0.7
inTrain <- createDataPartition(bioChemists[,1], p=split, list=FALSE)
training <- bioChemists[ inTrain,]
test <- bioChemists[-inTrain,]
fm_zip2 <- zeroinfl(art ~ . | ., data = training)

y <-test$art
n <- length(y)
max_count<-5000

max_count <- rep(max_count, length.out = n)
prob_fun <- function (x, mean, pis) {
    ind0 <- x == 0
    out <- (1 - pis) * dpois(x, lambda = mean)
    out[ind0] <- pis + out[ind0]
    out
  }

max_count_seq <- lapply(max_count, seq, from = 0)
pred <- predict(fm_zip2, newdata = test, type = "response")
pred_zi <- predict(fm_zip2, newdata = test, type="zero")  #zero probs equal to your zi_probs  attribute generated by predict.MixMod?.

logarithmic <- quadratic <- spherical <- numeric(n)
for (i in seq_len(n)) {
  p_y <- prob_fun(y[i], mean = pred[i], pis = pred_zi[i])
  quadrat_p <- sum(prob_fun(max_count_seq[[i]], mean = pred[i], 
                            pis = pred_zi[i])^2)
  logarithmic[i] <- log(p_y)
  quadratic[i] <- 2 * p_y + quadrat_p
  spherical[i] <- p_y / sqrt(quadrat_p)
}
result <- data.frame(logarithmic = logarithmic, quadratic = quadratic, 
                     spherical = spherical)

thanks

Andras 

On Saturday, March 30, 2019, 9:25:25 AM EDT, D. Rizopoulos <d.rizopoulos using erasmusmc.nl> wrote: 

You could have a look at proper scoring rules: https://en.m.wikipedia.org/wiki/Scoring_rule 

For an example in mixed models check this example in the GLMMadaptive package: https://drizopoulos.github.io/GLMMadaptive/articles/Dynamic_Predictions.html

Best, 

Dimitris 

Sent with my iPhone - apologies for typos

From: Andras Farkas via R-sig-mixed-models <r-sig-mixed-models using r-project.org>

Date: Saturday, 30 Mar 2019, 13:49

To: R-sig-mixed-models <r-sig-mixed-models using r-project.org>

Subject: [R-sig-ME] choice of prediction error calculations for zero inflated model

Hello All,

thought I would reach out to see if you have some guidance on the following: I am working with a zero inflated data set and fitting models that should be reasonable to model such data (zeroinfl() and hurdle() from pscl, mixtures from flexmix, glm.nb, etc) and trying to compare model predictive performance based on a validation data set (70% of all data was used to train and the renaming 30% for validation)... This is not necessarily a coding question but rather a stat oriented perhaps although a working example would be helpful, if there is one: I have looked extensively to see what is in the literature for measures of predictive performance of zero inflated models based on a validation data set to compare observed vs predicted responses for count data, but could not come up with much. I am familiar with general measures of performance, like RMSE, MAE, etc but finding not much on it's appropriateness for use in my setting. Some references point towards adjusted/pseudo r squared approaches but most references are to evaluate model fit during model development vs predictive performance using a validation set... Any thoughts, directions you may be able to help me with? 

much appreciate your input...

thanks,

Andras 

_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Ced28d249435a4fd20dce08d6b50e30e8%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C636895469887628746&sdata=ggaw%2BsDr1lYDWqQulpcL1YdOnF7OK2VBtVp917grWtI%3D&reserved=0