[R] Regroup and create new dataframe

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Jun 1 20:58:13 CEST 2018


Hello,

I don't understand why you are splitting data1 and then unlisting the 
result.

if you want to apply a modeling function to each of the subdf's, split 
by Product name, you can follow more or less these steps:

0. Create a dataset

set.seed(9376)    # Make the results reproducible

n <- 100
PN <- c("Target Brand", "3M", "Avery")
data1 <- data.frame(Product_name = sample(PN, n, TRUE),
                     Year_of_Record = sample(2011:2018, n, TRUE),
                     Sales = runif(n, 10, 1000),
                     Region = sample(letters[1:5], n, TRUE)
                     )

head(data1)


1. Split the dataset by product name. Thsi gives a list of subdf's.


X <- split(data1, data1$Product_name)


2. Now lappy a modeling function to each subdf.


modelFun <- function(DF){

     lm(Sales ~ Region, data = DF)

}

model_list <- lapply(X, modelFun )
model_smry <- lapply(model_list, summary)
model_smry[[1]]
#
#Call:
#  lm(formula = Sales ~ Region, data = DF)
#
#Residuals:
#  Min      1Q  Median      3Q     Max
#-487.41 -196.17    1.76  195.96  498.48
#
#Coefficients:
#  Estimate Std. Error t value Pr(>|t|)
#(Intercept)  437.300    108.147   4.044 0.000355 ***
#  Regionb      437.019    167.540   2.608 0.014229 *
#  Regionc      102.989    179.341   0.574 0.570217
#Regiond      105.520    152.942   0.690 0.495721
#Regione       -5.638    138.342  -0.041 0.967773
#---
#  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 286.1 on 29 degrees of freedom
#Multiple R-squared:  0.2426,    Adjusted R-squared:  0.1381
#F-statistic: 2.322 on 4 and 29 DF,  p-value: 0.08039

Hope this helps,


Rui Barradas


Às 16:54 de 01-06-2018, nguy2952 University of Minnesota escreveu:
> Hello folks,
>
> I have a big project to work on and the dataset is classified so I am just
> going to use my own example so everyone can understand what I am targeting.
>
> Let's take Target as an example: We consider three brands of tape: Target
> brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
> Product_Name(which contains three brands of tape), Sales, and Region. I
> want to create a new data frame that looks like this:
>
>                        Year of Record       Sales     Region
>    Target Brand
>    3M
>    Avery
>
> Here is what I did.
>
>     1.
>
>     I split the original data frame which I called data1:
>
>     X = split(data1, Product_name)
>
>     2.
>
>     Unlist X
>
>     X1 = unlist(X)
>
>     3.
>
>     Create a new data frame
>
>     new_df = as.data.frame(X1)
>
>
> But, when I used the command View(new_df), I had only two columns: The left
> one is similar to TargetBrand.Sales, etc. and the right one is just "X1"
>
> I did not achieve what I wanted.
>
> **A potentially big question from readers:*
>
> Why am I doing this?
>
> *Answer:*
>
> I want to run a multiple regression model later to see among different
> regions, what the sales look like for these three brands of tape:
>
> *Does Mid-west buy more house brand than East Coast?*
>
> or
>
> *Does region really affect the sales? Are Mid-West's purchases similar to
> those of East Coast and West Coast?*
>
> I need help. Please give me guidance.
>
> Sincerely,
> Hugh N
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list