[R] Regression Tree Questions

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Feb 24 21:09:13 CET 2018


As Bert implies, you may be getting ahead of yourself. An 8 may be a number, or it may be the character 8, or it could be a factor, and you don't seem to know the difference yet (thus suggesting tutorials). If you go to the trouble of making a reproducible example [1][2][3] then you may find the problem yourself or we will be able to check things using the example that you would not think to try. The str function can be helpful to find problems like the above. 

One surprisingly valuable step mentioned in the reprex references below is giving us the data for your example using the dput function. Another surprisingly useful technique is sending your question using plain text email format as the Posting Guide indicates (details of how to do that depends on your email client, which is off topic here).

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)
-- 
Sent from my phone. Please excuse my brevity.

On February 24, 2018 11:16:27 AM PST, Gary Black <gwblack001 at sbcglobal.net> wrote:
>Hi All,
>
>I'm a newbie and have two questions.  Please pardon me if they are very
>basic.
>
>
>1.  I'm using a regression tree to predict the selling prices of 10 new
>records (homes).  The following code is resulting in an error message: 
>pred <- predict(model, newdata = outOfSample[, -6]) 
>
>The error message is:
>
>Error in model.frame.default(Terms, newdata, na.action = na.action,
>xlev = attr(object,  : 
>factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211,
>2265, 2530, 2672, 3365
>
>
>Does anybody know what is causing this?  I've pasted a snippet of my
>original dataset (Crankshaw) and my out-of-sample dataset below.  Below
>it appears all code which I entered leading up to that point.  The
>error message appears at the end of that code.
>
>
>2.  How can I get the regression tree to display in a more "friendly"
>way?  Unfortunately I cannot paste a picture of it in this email, but
>it displays the values of individual records at each node instead of
>the decision rule logic (e.g., Age >= 28).  I'm using the command >
>fancyRpartPlot(model) to display the tree.
>
>
>Thank you!
>Gary
>
>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>Original Data (Crankshaw):
>
>Sq. Feet		Age	Bedrm	Bathrm	Garage	Sell Price ($)
>1620		17	3	2	2	185500
>1864		28	3	2	2	195250
>1628		15	3	2	2	190750
>1670		1	4	3	2	195750
>1762		23	3	4	2	197250
>1520		1	3	3	2	192900
>
>
>Out-of-Sample Data:
>
>NEW RECORDS:					
>Sq. Feet		Age	Bedrm	Bathrm	Garage	Sell Price ($)
>3365		8	4	4	3	
>1547		28	3	2	2	
>1375		36	2	1	1	
>1621		53	3	1	2	
>2530		23	4	3	2	
>1868		42	3	2	2	
>2211		23	3	2	2	
>1421		39	2	1	1	
>2672		3	4	2	3	
>2265		7	3	2	2	
>
>
>All Code Entered:
>
>> Crankshaw <- read_excel("C:/Data/Excel/Crankshaw.xlsx")
>> View(Crankshaw)
>> outOfSample <- Crankshaw[305:nrow(Crankshaw), ]
>> Crankshaw <- Crankshaw[1:300, ]
>> install.packages("caret")
>Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
>(as ‘lib’ is unspecified)
>trying URL
>'https://cran.rstudio.com/bin/windows/contrib/3.4/caret_6.0-78.zip'
>Content type 'application/zip' length 5155836 bytes (4.9 MB)
>downloaded 4.9 MB
>
>package ‘caret’ successfully unpacked and MD5 sums checked
>
>The downloaded binary packages are in
>	C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
>> install.packages("rattle")
>Installing package into ‘C:/Users/Jason/Documents/R/win-library/3.4’
>(as ‘lib’ is unspecified)
>trying URL
>'https://cran.rstudio.com/bin/windows/contrib/3.4/rattle_5.1.0.zip'
>Content type 'application/zip' length 1287407 bytes (1.2 MB)
>downloaded 1.2 MB
>
>package ‘rattle’ successfully unpacked and MD5 sums checked
>
>The downloaded binary packages are in
>	C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages
>> library(rpart)
>> library(caret)
>Loading required package: lattice
>Loading required package: ggplot2
>Warning messages:
>1: package ‘caret’ was built under R version 3.4.3 
>2: package ‘ggplot2’ was built under R version 3.4.3
>> library(rattle)
>> n <- nrow(Crankshaw)
>> train <- sample(1:n, size = 0.5 * n, replace = FALSE)
>> CrankshawTrain <- Crankshaw[train, ]
>> temp <- (1:n)[-train]
>> val <- sample(temp, size = (0.3 / 0.5) * length(temp), replace =
>FALSE)
>> CrankshawVal <- Crankshaw[val, ]
>> test <- (1:n)[-c(train, val)]
>> CrankshawTest <- Crankshaw[test, ]
>> model <- rpart(`Selling Price ($)` ~ ., method = "anova", data =
>CrankshawTrain)
>> fancyRpartPlot(model)
>> pred <- predict(model, newdata = outOfSample[, -6])
>Error in model.frame.default(Terms, newdata, na.action = na.action,
>xlev = attr(object,  : 
>factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211,
>2265, 2530, 2672, 3365
>
>
>---
>This email has been checked for viruses by Avast antivirus software.
>https://www.avast.com/antivirus
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list