[R-sig-teaching] Legend for curve fit plot
Ista Zahn
istazahn at gmail.com
Wed Dec 31 02:58:26 CET 2014
Hi Stan,
Here is one way to get the legend (I've also cleaned up the write statements):
dp4dsFit <- function(dataFrame,
indepVarName,
depVarName,
xLabel = indepVarName,
yLabel = depVarName) {
library(ggplot2)
library(labeling)
dp4dsQuadraticFit <- lm(dataFrame[,depVarName] ~
poly(dataFrame[,indepVarName],2))
cat(
"=============\r
Quadratic fit\r
=============\r")
print(summary(dp4dsQuadraticFit))
dp4dsNlogNFit <- lm(dataFrame[,depVarName] ~
dataFrame[,indepVarName]:log(dataFrame[,indepVarName]) +
dataFrame[,indepVarName])
cat(
"==========\r
n lg n fit\r
==========\r")
print(summary(dp4dsNlogNFit))
dataFrame <- rbind(data.frame(dataFrame,
predicted = predict(dp4dsQuadraticFit),
model = "Quadratic"),
data.frame(dataFrame,
predicted = predict(dp4dsNlogNFit),
model = "n lg n"))
ggplot() +
geom_point(data = subset(dataFrame, model = "Quadratic"),
aes_string(x = indepVarName, y = depVarName),
size = 3) +
geom_line(data = dataFrame,
aes_string(x = indepVarName, y = "predicted", color =
"model")) +
xlab(label = xLabel) +
ylab(label = yLabel)
}
But this is not the R way(tm). The R way is to give your user control
over the output by returning values from your functions, and writing
print or summary methods. Here is how I would go about it:
dp4dsFit <- function(dataFrame,
indepVarName,
depVarName) {
dp4dsQuadraticFit <- lm(dataFrame[,depVarName] ~
poly(dataFrame[,indepVarName],2))
dp4dsNlogNFit <- lm(dataFrame[,depVarName] ~
dataFrame[,indepVarName]:log(dataFrame[,indepVarName]) +
dataFrame[,indepVarName])
dataFrame <- rbind(data.frame(dataFrame,
predicted = predict(dp4dsQuadraticFit),
model = "Quadratic"),
data.frame(dataFrame,
predicted = predict(dp4dsNlogNFit),
model = "n lg n"))
R <- list(dp4dsQuadraticFit = dp4dsQuadraticFit,
dp4dsNlogNFit = dp4dsNlogNFit,
dataFrame = dataFrame,
indepVarName = indepVarName,
depVarName = depVarName)
class(R) <- c("dp4dsFit", class(R))
return(R)
}
print.dp4dsFit <- function(x) {
cat(
"=============\r
Quadratic fit\r
=============\r")
print(x$dp4dsQuadraticFit)
cat(
"==========\r
n lg n fit\r
==========\r")
print(x$dp4dsNlogNFit)
}
summary.dp4dsFit <- function(x, plot = FALSE) {
R <- sapply(x[1:2],
summary,
simplify=FALSE)
if(plot) print(plot(x))
class(R) <- c("dp4dsFit", class(R))
return(R)
}
plot.dp4dsFit <- function(x, xLabel = x$indepVarName, yLabel = x$depVarName) {
library(ggplot2)
ggplot() +
geom_point(data = subset(x$dataFrame, model == "Quadratic"),
aes_string(x = x$indepVarName, y = x$depVarName),
size = 3) +
geom_line(data = x$dataFrame,
aes_string(x = x$indepVarName, y = "predicted", color =
"model")) +
xlab(label = xLabel) +
ylab(label = yLabel)
}
## now you can do it all in one:
models <- dp4dsFit(mtcars, "mpg", "hp")
summary(models, plot=TRUE)
## or just plot it
plot(models)
## or just look at the model summaries
summary(models)
## or do something else entirely:
par( mfcol = c(2, 1))
plot(models[[1]], which = 1)
plot(models[[2]], which = 1)
Best,
Ista
On Tue, Dec 30, 2014 at 5:49 PM, Warford, Stan
<Stan.Warford at pepperdine.edu> wrote:
> Hello all,
>
> I provide a function for my students to do two curve fits with a single set of data:
>
> # Performs two curve fits, quadratic and n lg n, with a plot of the data and the two curves
> # First parameter: A data frame
> # Second parameter: Name of the independent (x) variable
> # Third parameter: Name of the dependent (y) variable
> # Fourth parameter: The label for the x-axis
> # Fifth parameter: The label for the y-axis
> dp4dsFit <- function(dataFrame, indepVarName, depVarName, xLabel, yLabel) {
> library(ggplot2)
> library(labeling)
> dp4dsQuadraticFit <- lm(dataFrame[,depVarName] ~ poly(dataFrame[,indepVarName],2))
> write("=============\r",file="")
> write("Quadratic fit\r",file="")
> write("=============\r",file="")
> print(summary(dp4dsQuadraticFit))
> dp4dsNlogNFit <- lm(dataFrame[,depVarName] ~ dataFrame[,indepVarName]:log(dataFrame[,indepVarName]) + dataFrame[,indepVarName])
> write("==========\r",file="")
> write("n lg n fit\r",file="")
> write("==========\r",file="")
> print(summary(dp4dsNlogNFit))
> ggplot() +
> geom_point(data = dataFrame, aes_string(x = indepVarName, y = depVarName), size = 3) +
> geom_smooth(data = dataFrame, aes_string(x = indepVarName, y = depVarName),
> method = "lm", se = FALSE, colour = "RED", formula = y ~ poly(x,2)) +
> geom_smooth(data = dataFrame, aes_string(x = indepVarName, y = depVarName),
> method = "lm", se = FALSE, colour = "BLUE", formula = y ~ x:log(x) + x) +
> xlab(label = xLabel) +
> ylab(label = yLabel)
> }
>
>
> I use ggplot to produce the plot, but I cannot figure out how to produce the legend. Every example I have seen assumes a separate entry in the legend for each set of data. The problem is I have a single set of data with two different curve fits. How do I make a legend with red for the quadratic curve fit and blue for the n lg n curve fit?
>
> Another minor question. Are the above write statements the best way to echo a message to the console?
>
> Thanks,
> Stan
>
> J. Stanley Warford
> Professor of Computer Science
> Pepperdine University
> Malibu, CA 90263
> Stan.Warford at pepperdine.edu<mailto:Stan.Warford at pepperdine.edu>
> 310-506-4332
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
More information about the R-sig-teaching
mailing list