[R] Creating data frame of predicted and actual values in R for plotting

Muhammad Bilal Muhammad2.Bilal at live.uwe.ac.uk
Wed May 11 12:36:04 CEST 2016


I have achieved this use case by writing the following commands:

all_predictions <- data.frame(pid = testPFI$project_id, actual_delay = testPFI$project_delay,lm_pred, tree_pred, best_tree_pred, rf_pred)

str(all_predictions)

all_pred <- sqldf("SELECT pid, actual_delay, ROUND(lm_pred,2) lm_pred,
                               ROUND(tree_pred,2) tree_pred,
                               ROUND(best_tree_pred,2) train_pred,
                               ROUND(rf_pred,2) rf_pred
                     FROM all_predictions
                      ORDER BY actual_delay")
all_pred

#Plotting all the predictions on the graph
ggplot(all_pred, aes(x=pid)) + geom_line(aes(y=actual_delay), colour="blue") +
  geom_line(aes(y=lm_pred), colour="red", size=1)  +
  geom_line(aes(y=tree_pred), colour="green", size=1)  +
  geom_line(aes(y=train_pred), colour="yellow", size=1)  +
  geom_line(aes(y=rf_pred), colour="black", size=1)

So I am done.

Many Thanks and

Kind Regards
--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bilal at live.uwe.ac.uk


________________________________________
From: Muhammad Bilal
Sent: 11 May 2016 01:06:32
To: r-help at r-project.org
Subject: Re: [R] Creating data frame of predicted and actual values in R for    plotting

Pls don't mind the typo in predict() functions for some of the models.

Sent from my iPhone

> On 11 May 2016, at 12:47 am, Muhammad Bilal <Muhammad2.Bilal at live.uwe.ac.uk> wrote:
>
> Hi All,
>
>
> I have the following dataset:
>
>
>> str(pfi_v3)
> 'data.frame': 714 obs. of  8 variables:
> $ project_id             : int  1 2 3 4 5 6 7 8 9 10 ...
> $ project_lat            : num  51.4 51.5 52.2 51.5 53.5 ...
> $ project_lon            : num  -0.642 -1.85 0.08 0.126 -1.392 ...
> $ sector                 : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 6 6 6 6 6 6 6 ...
> $ project_duration       : int  1826 3652 121 520 1087 730 730 730 790 522 ...
> $ project_delay          : int  -323 0 -60 0 0 0 0 0 0 -91 ...
> $ capital_value          : num  6.7 5.8 21.8 47.3 47 24.2 40.7 71.9 10.7 70 ...
> $ contract_type          : Factor w/ 2 levels "Lumpsum","Turnkey": 2 2 2 2 2 2 2 2 2 2 ...
>
>
> I'm using following commands to create training and test sets:
>
> split <- sample.split(pfi_v3, SplitRatio = 0.8)
> trainPFI <- subset(pfi_v3, split == TRUE)
> testPFI <- subset(pfi_v3, split == FALSE)
>
>
> I am using several predictive models to estimate delay in projects.
>
>
> The commands are given as below:
>
>
> 1. Simple linear regression
>
> lm_m <- lm(project_delay ~ project_lon +
>
>                                                     project_lat +
>
>                                                     project_duration +
>
>                                                     sector +
>
>                                                     contract_type +
>
>                                                     capital_value,
>
>                         data = trainPFI)
>
> lm_pred <- predict(lm_m2, newdata = testPFI)
>
>
> 2. Regression tree
>
> tree_m <- rpart(project_delay ~ project_lon +
>                                                          project_lat +
>                                                          project_duration +
>                                                          sector +
>                                                          contract_type +
>                                                          capital_value,
>                                data = trainPFI)
>
> tree_pred <- predict(tree_m2, newdata = testPFI)
>
> 3. Cp optimsed regression tree
>
> train_m <- train(project_delay ~ project_lon +
>                                                           project_lat +
>                                                           project_duration +
>                                                           sector +
>                                                           contract_type +
>                                                           capital_value,
>                     data = trainPFI,
>                     method="rpart",
>                     trControl=tr.control, tuneGrid = cp.grid)
>
>
> train_pred <- predict(tr_m, newdata = testPFI)
>
>
> 4. Random Forest
>
> rf_m <- randomForest(project_delay ~ project_lon +
>                       project_lat +
>                       project_duration +
>                       sector +
>                       contract_type +
>                       capital_value,
>                     data = trainPFI,
>                     importance=TRUE,
>                     ntree = 2000)
>
> rf_pred <- predict(rf_m, newdata = testPFI)
>
> 5. Conditional Forest
> cf_m <- cforest(project_delay ~ project_lon +
>                       project_lat +
>                       project_duration +
>                       sector +
>                       contract_type +
>                       capital_value,
>                     data = trainPFI,
>                     controls=cforest_unbiased(ntree=2000, mtry=3))
>
> cf_pred <- predict(cf_m, testPFI, OOB=TRUE, type = "response")
>
> That is it.
>
>
> Now I want to create a new data frame to combine the actual and predicted values such that the new frame has the following columns:
>
> $project_id
>
> $actual_delay
>
> $lm_predicted_delay
>
> $tree_predicted_delay
>
> $train_predicted_delay
>
> $rf_predicted_delay
>
> $cf_predicted_delay
>
>
> I want to use this dataframe to draw the line chart to compare predictions.
>
>
> How to achieve this?
>
>
> Any help will be highly appreciated.
>
>
> Many Thanks and
>
>
> Kind Regards
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> muhammad2.bilal at live.uwe.ac.uk<mailto:olugbenga2.akinade at live.uwe.ac.uk>
>
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list