[R] How to show a specific value of a ggplot2
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri May 27 12:17:17 CEST 2022
Hello,
I think the function find_x_from_profile below does what you want.
I have used the data set in the first example of ?readARFF, the built-in
and all-present data set iris.
The function returns a one line data.frame whose column names are "x"
and "y". Pass the y-axis value in argument ynew and the value you want
is output column "x".
The function only takes one y value at a time, this can be changed if
needed.
suppressPackageStartupMessages({
library(farff)
library(mlr3)
library(mlr3learners)
library(mlr3filters)
library(mlr3extralearners)
library(DALEX)
library(DALEXtra)
library(readr)
library(ggplot2)
})
# make the results reproducible
set.seed(2022)
# this is the data for the reprex
path <- tempfile()
writeARFF(iris, path = path)
data <- readARFF(path)
# Parse with reader=readr :
C:\Users\ruipb\AppData\Local\Temp\RtmpUxSDP3\file578778a1417
# header: 0.000000; preproc: 0.000000; data: 0.110000; postproc:
0.000000; total: 0.110000
# data = readARFF("ant.arff")
index <- sample(1:nrow(data), 0.7*nrow(data))
train <- data[index,]
test <- data[-index,]
task <- TaskRegr$new("data", backend = train, target = "Sepal.Length")
learner <- lrn("regr.randomForest")
model <- learner$train(task )
explainer <- explain_mlr3(model,
data = test[,-16],
y = as.numeric(test$Sepal.Length)-1,
label="RF")
# Preparation of a new explainer is initiated
# -> model label : RF
# -> data : 45 rows 5 cols
# -> target variable : 45 values
# -> predict function : yhat.LearnerRegr will be used ( default )
# -> predicted values : No value for predict function target
column. ( default )
# -> model_info : package mlr3 , ver. 0.13.3 , task
regression ( default )
# -> predicted values : numerical, min = 4.775823 , mean =
5.892271 , max = 7.226967
# -> residual function : difference between y and yhat ( default )
# -> residuals : numerical, min = -1.642701 , mean =
-0.9922714 , max = -0.2101927
# A new explainer has been created!
m <- model_profile(explainer = explainer, variables = "Sepal.Width")
find_x_from_profile <- function(model, xvar, ynew) {
if(length(ynew) > 1) {
warn <- "'ynew' length is greater than 1, only the first is
considered."
warning(warn)
ynew <- ynew[1]
}
ap <- m$agr_profiles[c("_yhat_", "_x_")]
names(ap) <- c("yhat", "x")
i <- order(ap$yhat)
ap <- ap[i, ]
j <- findInterval(ynew, ap$yhat)
olddata <- data.frame(
x = ap$yhat[order(i)][j:(j + 1)],
y = ap$x[order(i)][j:(j + 1)]
)
newdata <- approx(olddata, xout = ynew)
newdata <- as.data.frame(newdata)
names(newdata) <- rev(names(newdata))
newdata[2:1]
}
find_x_from_profile(m, xvar = "Sepal.Width", 5.85)
# x y
# 1 2.941472 5.85
newdata <- find_x_from_profile(m, xvar = "Sepal.Width", 5.85)
p <- plot(m)
p +
geom_point(
data = newdata,
mapping = aes(x, y),
color = "red",
size = 2,
inherit.aes = FALSE
)
Hope this helps,
Rui Barradas
Às 08:54 de 27/05/2022, Neha gupta escreveu:
> I am sorry for that.
>
> I used
>
> library(farff)
> library(mlr3learners)
> library(mlr3filters)
> library(mlr3extralearners)
> library(mlr3)
> library(DALEX)
> library(DALEXtra)
>
> data = readARFF("ant.arff")
> index= sample(1:nrow(data), 0.7*nrow(data))
> train= data[index,]
> test= data[-index,]
> task = TaskRegr$new("data", backend = train, target = "bug")
>
> learner= lrn("regr.randomForest")
> model= learner$train(task )
>
> explainer = explain_mlr3(model,
> data = test[,-16],
> y = as.numeric(test$bug)-1,
> label="RF")
>
> m=model_profile(explainer = explainer, variables = "rfc")
>
> plot(m)
>
> Ant it shows a plot, with values of x axis (bug) and y axis (rfc)
>
> I can manually see what is the value of bug at rfc=75, but I need the
> exact value and by seeing the plot and guessing the rfc=75 value for bug
> might not be the exact value I need.
>
> Thank you
>
> On Fri, May 27, 2022 at 9:39 AM Rui Barradas <ruipbarradas using sapo.pt
> <mailto:ruipbarradas using sapo.pt>> wrote:
>
> Hello,
>
> Neha, it's not the first time you post questions to R-Help, please,
> please!, start your scripts by loading the packages needed.
>
> I have never used package DALEX but for what I understand from its
> documentation it helps to explore and explain models behavior. If your
> profile plot was output by method plot.model_profile(), the workflow is
> or seems to be
>
> 1. fit a model;
> 2. create an object of S3 class "model_profile" with functions
> explain()
> and model_profile();
> 3. plot that object.
>
>
> So to know what is the value of y for a given x, predict from the
> fitted
> model, package DALEX and its plots have nothing to do with it.
> If there's a predict method for the fitting function, then it should be
> as simple as
>
>
> newdata75 <- data.frame(x = 75)
> y75 <- predict(fit, newdata = newdata75)
>
>
> or something similar.
>
> I have never used this package so I might be completely wrong.
>
> Hope this helps,
>
> Rui Barradas
>
> Às 08:09 de 27/05/2022, Neha gupta escreveu:
> > Thank you Rui, Avi
> >
> > I am using the plot(), in the Dalex package and it implements the
> ggplot.
> >
> > So I only used plot(mydata) and it displays the ggplot . If we
> need to
> > adjust or make further changes in the plot, I think people use
> >
> > plot + .....
> > I don't know if this group support the image pasting but my plot is
> > showing like below. (bugs is a variable in my data whose values are
> > displayed on y-axis and RFC is another variable in my dataset whose
> > value is shown on the x-axis. I want to know exactly (not
> necessarily
> > using the plot, a simple print function should also work for me)
> what is
> > the value of 'bug' when the value of 'rfc' is 75.
> >
> > image.png
> >
> >
> > On Fri, May 27, 2022 at 7:49 AM Rui Barradas
> <ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>
> > <mailto:ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>>> wrote:
> >
> > Hello,
> >
> > If you cannot determine the exact value of y for given x,
> then isn't
> > your problem how to determine an approximate value of y? Once
> you have
> > it, it's easy to plot it.
> >
> > With newdata = data.frame(x = 75, y = ???),
> >
> >
> > ggplot(mydata, mapping = aes(x, y)) +
> > geom_point(color = "black") +
> > geom_point(newdata, mapping = aes(x, y), color = "red") +
> > xlim(0, 200)
> >
> >
> > The question is how to find newdata$y, interpolation, other
> method?
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 00:40 de 27/05/2022, Neha gupta escreveu:
> > > I have a ggplot2 which has x-values 0-200 and y values 0-10
> > >
> > > p=plot(mydata)
> > > p+xlim(0, 200)
> > >
> > > I want to show what is the y value when we have 75 as x value.
> > The graph
> > > which is displayed has a broad range (like 0-50, 50-100
> etc on x
> > axis) and
> > > cannot determine the exact value of y at the value of 75
> on x-axis.
> > >
> > > Thank you
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org <mailto:R-help using r-project.org>
> <mailto:R-help using r-project.org <mailto:R-help using r-project.org>> mailing list
> > -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> > <https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>>
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> > <http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>>
> > > and provide commented, minimal, self-contained,
> reproducible code.
> >
>
More information about the R-help
mailing list