[R] Caret and custom summary function

Max Kuhn mxkuhn at gmail.com
Mon May 11 17:37:23 CEST 2015


The version of caret just put on CRAN has a function called mnLogLoss that
does this.

Max

On Mon, May 11, 2015 at 11:17 AM, Lorenzo Isella <lorenzo.isella at gmail.com>
wrote:

> Dear All,
> I am trying to implement my own metric (a log loss metric) for a
> binary classification problem in Caret.
> I must be making some mistake, because I cannot get anything sensible
> out of it.
> I paste below a numerical example which should run in more or less one
> minute on any laptop.
> When I run it, I finally have an output of the kind
>
>
>
>
> Aggregating results
> Something is wrong; all the LogLoss metric values are missing:
>    LogLoss
>     Min.   : NA
>      1st Qu.: NA
>       Median : NA
>        Mean   :NaN
>          3rd Qu.: NA
>           Max.   : NA
>            NA's   :40
>            Error in train.default(x, y, weights = w, ...) : Stopping
>            In addition: Warning message:
>            In nominalTrainWorkflow(x = x, y = y, wts = weights, info =
>            trainInfo,  :
>              There were missing values in resampled performance
>              measures.
>
>
>
>
> Any suggestion is appreciated.
> Many thanks
>
> Lorenzo
>
>
>
>
>
> ####################################################เเ
>
> library(caret)
> library(C50)
>
>
> LogLoss <- function (data, lev = NULL, model = NULL)
> {
>    probs <- pmax(pmin(as.numeric(data$T), 1 - 1e-15), 1e-15)
>        logPreds <- log(probs)
>             log1Preds <- log(1 - probs)
>                 real <- (as.numeric(data$obs) - 1)
>                     out <- c(mean(real * logPreds + (1 - real) *
>                     log1Preds)) * -1
>                         names(out) <- c("LogLoss")
>                             out
>                             }
>
>
>
>
>
>
> train <- matrix(ncol=5,nrow=200,NA)
>
> train <- as.data.frame(train)
> names(train) <- c("donation", "x1","x2","x3","x4")
>
> set.seed(134)
>
> sel <- sample(nrow(train), 0.5*nrow(train))
>
>
> train$donation[sel] <- "yes"
> train$donation[-sel] <- "no"
>
> train$x1 <- seq(nrow(train))
> train$x2 <- rnorm(nrow(train))
> train$x3 <- 1/train$x1
> train$x4 <- sample(nrow(train))
>
> train$donation <- as.factor(train$donation)
>
> c50Grid <- expand.grid(trials = 1:10,
>         model = c( "tree" ,"rules"
>                             ),winnow = c(TRUE,
>                                                      FALSE ))
>
>
>
>
>
> tc <- trainControl(method = "repeatedCV", summaryFunction=LogLoss,
>                   number = 10, repeats = 10, verboseIter=TRUE,
>                   classProbs=TRUE)
>
>
> model <- train(donation~., data=train, method="C5.0", trControl=tc,
>               metric="LogLoss", maximize=FALSE, tuneGrid=c50Grid)
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list