[R-sig-ME] choice of prediction error calculations for zero inflated model
Andras Farkas
motyoc@k@ @end|ng |rom y@hoo@com
Sun Mar 31 16:20:05 CEST 2019
Dimitris,
thank you, very helpful.... I took a look at your scoring_rules function and attempted to "re-write" it for zero inflated poisson based on output of pscl predict models, would you mind taking a quick look, maybe it will become useful for someone else eventually:
library(caret)
library(pscl)
data("bioChemists", package = "pscl")
split=0.7
inTrain <- createDataPartition(bioChemists[,1], p=split, list=FALSE)
training <- bioChemists[ inTrain,]
test <- bioChemists[-inTrain,]
fm_zip2 <- zeroinfl(art ~ . | ., data = training)
y <-test$art
n <- length(y)
max_count<-5000
max_count <- rep(max_count, length.out = n)
prob_fun <- function (x, mean, pis) {
ind0 <- x == 0
out <- (1 - pis) * dpois(x, lambda = mean)
out[ind0] <- pis + out[ind0]
out
}
max_count_seq <- lapply(max_count, seq, from = 0)
pred <- predict(fm_zip2, newdata = test, type = "response")
pred_zi <- predict(fm_zip2, newdata = test, type="zero") #zero probs equal to your zi_probs attribute generated by predict.MixMod?.
logarithmic <- quadratic <- spherical <- numeric(n)
for (i in seq_len(n)) {
p_y <- prob_fun(y[i], mean = pred[i], pis = pred_zi[i])
quadrat_p <- sum(prob_fun(max_count_seq[[i]], mean = pred[i],
pis = pred_zi[i])^2)
logarithmic[i] <- log(p_y)
quadratic[i] <- 2 * p_y + quadrat_p
spherical[i] <- p_y / sqrt(quadrat_p)
}
result <- data.frame(logarithmic = logarithmic, quadratic = quadratic,
spherical = spherical)
thanks
Andras
On Saturday, March 30, 2019, 9:25:25 AM EDT, D. Rizopoulos <d.rizopoulos using erasmusmc.nl> wrote:
You could have a look at proper scoring rules: https://en.m.wikipedia.org/wiki/Scoring_rule
For an example in mixed models check this example in the GLMMadaptive package: https://drizopoulos.github.io/GLMMadaptive/articles/Dynamic_Predictions.html
Best,
Dimitris
Sent with my iPhone - apologies for typos
From: Andras Farkas via R-sig-mixed-models <r-sig-mixed-models using r-project.org>
Date: Saturday, 30 Mar 2019, 13:49
To: R-sig-mixed-models <r-sig-mixed-models using r-project.org>
Subject: [R-sig-ME] choice of prediction error calculations for zero inflated model
Hello All,
thought I would reach out to see if you have some guidance on the following: I am working with a zero inflated data set and fitting models that should be reasonable to model such data (zeroinfl() and hurdle() from pscl, mixtures from flexmix, glm.nb, etc) and trying to compare model predictive performance based on a validation data set (70% of all data was used to train and the renaming 30% for validation)... This is not necessarily a coding question but rather a stat oriented perhaps although a working example would be helpful, if there is one: I have looked extensively to see what is in the literature for measures of predictive performance of zero inflated models based on a validation data set to compare observed vs predicted responses for count data, but could not come up with much. I am familiar with general measures of performance, like RMSE, MAE, etc but finding not much on it's appropriateness for use in my setting. Some references point towards adjusted/pseudo r squared approaches but most references are to evaluate model fit during model development vs predictive performance using a validation set... Any thoughts, directions you may be able to help me with?
much appreciate your input...
thanks,
Andras
_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-mixed-models&data=02%7C01%7Cd.rizopoulos%40erasmusmc.nl%7Ced28d249435a4fd20dce08d6b50e30e8%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C636895469887628746&sdata=ggaw%2BsDr1lYDWqQulpcL1YdOnF7OK2VBtVp917grWtI%3D&reserved=0
More information about the R-sig-mixed-models
mailing list