[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
Tobias Sing
tobias.sing at gmail.com
Thu Sep 24 15:57:30 CEST 2009
Tim,
if I understand correctly, you are trying to get the numerical values
of averaged cross-validation curves.
Unfortunately the plot function of ROCR does not return anything in
the current version (it's a good suggestion to change this).
If you want a quick fix, you could change the plot.performance
function of ROCR to return back the values you wanted.
Kind regards,
Tobias
On Thu, Sep 24, 2009 at 3:09 PM, Tim Howard <tghoward at gw.dec.state.ny.us> wrote:
> All,
> I'm trying again with a slightly more generic version of my first question. I can extract the
> plotted values from hist(), boxplot(), and even plot.randomForest(). Observe:
>
> # get some data
> dat <- rnorm(100)
> # grab histogram data
> hdat <- hist(dat)
> hdat #provides details of the hist output
>
> #grab boxplot data
> bdat <- boxplot(dat)
> bdat #provides details of the boxplot output
>
> # the same works for randomForest
> library(randomForest)
> data(mtcars)
> RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, ntree=100), log="y")
> RFdat
>
>
> ##But, I can't use this method in ROCR
> library(ROCR)
> data(ROCR.xval)
> RCdat <- plot(perf, avg="threshold")
>
> RCdat
> ## output: NULL
>
> Does anyone have any tricks for piping or extracting these data?
> Or, perhaps for steering me in another direction?
>
> Thanks,
> Tim
>
>
> From: "Tim Howard" <tghoward at gw.dec.state.ny.us>
> Subject: [R] ROCR.plot methods, cross validation averaging
> To: <osander at mpi-sb.mpg.de>, <tobias.sing at mpi-sb.mpg.de>,
> <r-help at r-project.org>
> Message-ID: <4ABA1079.6D16.00D5.0 at gw.dec.state.ny.us>
> Content-Type: text/plain; charset=US-ASCII
>
> Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) -
>
> I think my first question is generic and could apply to many methods,
> which is why I'm directing this initially to R-help as well as Tobias and Oliver.
>
> Question 1. The plot function in ROCR will average your cross validation
> data if asked. I'd like to use that averaged data to find a "best" cutoff
> but I can't figure out how to grab the actual data that get plotted.
> A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it.
>
> Question 2. I am asking ROCR to average lists with varying lengths for
> each list entry. See my example below. None of the ROCR examples have data
> structured in this manner. Can anyone speak to whether the averaging
> methods in ROCR allow for this? If I can't easily grab the data as desired
> from Question 1, can someone help me figure out how to average the lists,
> by threshold, similarly?
>
> Question 3. If my cross validation data happen to have a list entry whose
> length = 2, ROCR errors out. Please see the second part of my example.
> Any suggestions?
>
> #reproducible examples exemplifying my questions
> ##part one##
> library(ROCR)
> data(ROCR.xval)
> # set up data so it looks more like my real data
> sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25)
> testSet <- ROCR.xval
> # do the extraction
> for (i in 1:length(ROCR.xval[[1]])){
> y <- sample(c(1:350),sampSize[i])
> testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y]
> testSet$labels[[i]] <- ROCR.xval$labels[[i]][y]
> }
> # now massage the data using ROCR, set up for a ROC plot
> # if it errors out here, run the above sample again.
> pred <- prediction(testSet$predictions, testSet$labels)
> perf <- performance(pred,"tpr","fpr")
> # create the ROC plot, averaging by cutoff value
> plot(perf, avg="threshold")
> # check out the structure of the data
> str(perf)
> # note the ragged edges of the list and that I assume averaging
> # whether it be vertical, horizontal, or threshold, somehow
> # accounts for this?
>
> ## part two ##
> # add a list entry with only two values
> perf at x.values[[1]] <- c(0,1)
> perf at y.values[[1]] <- c(0,1)
> perf at alpha.values[[1]] <- c(Inf,0)
>
> plot(perf, avg="threshold")
>
> ##output results in an error with this message
> # Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, :
> # missing value where TRUE/FALSE needed
>
>
> Thanks in advance for your help
> Tim Howard
> New York Natural Heritage Program
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list