[R] Regroup and create new dataframe
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Jun 1 20:58:13 CEST 2018
Hello,
I don't understand why you are splitting data1 and then unlisting the
result.
if you want to apply a modeling function to each of the subdf's, split
by Product name, you can follow more or less these steps:
0. Create a dataset
set.seed(9376) # Make the results reproducible
n <- 100
PN <- c("Target Brand", "3M", "Avery")
data1 <- data.frame(Product_name = sample(PN, n, TRUE),
Year_of_Record = sample(2011:2018, n, TRUE),
Sales = runif(n, 10, 1000),
Region = sample(letters[1:5], n, TRUE)
)
head(data1)
1. Split the dataset by product name. Thsi gives a list of subdf's.
X <- split(data1, data1$Product_name)
2. Now lappy a modeling function to each subdf.
modelFun <- function(DF){
lm(Sales ~ Region, data = DF)
}
model_list <- lapply(X, modelFun )
model_smry <- lapply(model_list, summary)
model_smry[[1]]
#
#Call:
# lm(formula = Sales ~ Region, data = DF)
#
#Residuals:
# Min 1Q Median 3Q Max
#-487.41 -196.17 1.76 195.96 498.48
#
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 437.300 108.147 4.044 0.000355 ***
# Regionb 437.019 167.540 2.608 0.014229 *
# Regionc 102.989 179.341 0.574 0.570217
#Regiond 105.520 152.942 0.690 0.495721
#Regione -5.638 138.342 -0.041 0.967773
#---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 286.1 on 29 degrees of freedom
#Multiple R-squared: 0.2426, Adjusted R-squared: 0.1381
#F-statistic: 2.322 on 4 and 29 DF, p-value: 0.08039
Hope this helps,
Rui Barradas
Às 16:54 de 01-06-2018, nguy2952 University of Minnesota escreveu:
> Hello folks,
>
> I have a big project to work on and the dataset is classified so I am just
> going to use my own example so everyone can understand what I am targeting.
>
> Let's take Target as an example: We consider three brands of tape: Target
> brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
> Product_Name(which contains three brands of tape), Sales, and Region. I
> want to create a new data frame that looks like this:
>
> Year of Record Sales Region
> Target Brand
> 3M
> Avery
>
> Here is what I did.
>
> 1.
>
> I split the original data frame which I called data1:
>
> X = split(data1, Product_name)
>
> 2.
>
> Unlist X
>
> X1 = unlist(X)
>
> 3.
>
> Create a new data frame
>
> new_df = as.data.frame(X1)
>
>
> But, when I used the command View(new_df), I had only two columns: The left
> one is similar to TargetBrand.Sales, etc. and the right one is just "X1"
>
> I did not achieve what I wanted.
>
> **A potentially big question from readers:*
>
> Why am I doing this?
>
> *Answer:*
>
> I want to run a multiple regression model later to see among different
> regions, what the sales look like for these three brands of tape:
>
> *Does Mid-west buy more house brand than East Coast?*
>
> or
>
> *Does region really affect the sales? Are Mid-West's purchases similar to
> those of East Coast and West Coast?*
>
> I need help. Please give me guidance.
>
> Sincerely,
> Hugh N
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list