[R] Regression Tree Questions
Gary Black
gwblack001 at sbcglobal.net
Sat Feb 24 20:16:27 CET 2018
Hi All,
I'm a newbie and have two questions. Please pardon me if they are very basic.
1. I'm using a regression tree to predict the selling prices of 10 new records (homes). The following code is resulting in an error message: pred <- predict(model, newdata = outOfSample[, -6])
The error message is:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, :
factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365
Does anybody know what is causing this? I've pasted a snippet of my original dataset (Crankshaw) and my out-of-sample dataset below. Below it appears all code which I entered leading up to that point. The error message appears at the end of that code.
2. How can I get the regression tree to display in a more "friendly" way? Unfortunately I cannot paste a picture of it in this email, but it displays the values of individual records at each node instead of the decision rule logic (e.g., Age >= 28). I'm using the command > fancyRpartPlot(model) to display the tree.
Thank you!
Gary
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Original Data (Crankshaw):
Sq. Feet Age Bedrm Bathrm Garage Sell Price ($)
1620 17 3 2 2 185500
1864 28 3 2 2 195250
1628 15 3 2 2 190750
1670 1 4 3 2 195750
1762 23 3 4 2 197250
1520 1 3 3 2 192900
Out-of-Sample Data:
NEW RECORDS:
Sq. Feet Age Bedrm Bathrm Garage Sell Price ($)
3365 8 4 4 3
1547 28 3 2 2
1375 36 2 1 1
1621 53 3 1 2
2530 23 4 3 2
1868 42 3 2 2
2211 23 3 2 2
1421 39 2 1 1
2672 3 4 2 3
2265 7 3 2 2
All Code Entered:
> Crankshaw <- read_excel("C:/Data/Excel/Crankshaw.xlsx")
> View(Crankshaw)
> outOfSample <- Crankshaw[305:nrow(Crankshaw), ]
> Crankshaw <- Crankshaw[1:300, ]
> install.packages("caret")
Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/caret_6.0-78.zip'
Content type 'application/zip' length 5155836 bytes (4.9 MB)
downloaded 4.9 MB
package ‘caret’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
> install.packages("rattle")
Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/rattle_5.1.0.zip'
Content type 'application/zip' length 1287407 bytes (1.2 MB)
downloaded 1.2 MB
package ‘rattle’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
> library(rpart)
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
Warning messages:
1: package ‘caret’ was built under R version 3.4.3
2: package ‘ggplot2’ was built under R version 3.4.3
> library(rattle)
> n <- nrow(Crankshaw)
> train <- sample(1:n, size = 0.5 * n, replace = FALSE)
> CrankshawTrain <- Crankshaw[train, ]
> temp <- (1:n)[-train]
> val <- sample(temp, size = (0.3 / 0.5) * length(temp), replace = FALSE)
> CrankshawVal <- Crankshaw[val, ]
> test <- (1:n)[-c(train, val)]
> CrankshawTest <- Crankshaw[test, ]
> model <- rpart(`Selling Price ($)` ~ ., method = "anova", data = CrankshawTrain)
> fancyRpartPlot(model)
> pred <- predict(model, newdata = outOfSample[, -6])
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, :
factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the R-help
mailing list