[R] How to avoid the three loops in R?
John McKown
john.archie.mckown at gmail.com
Fri Aug 1 16:06:50 CEST 2014
On Fri, Aug 1, 2014 at 6:41 AM, Lingyi Ma <lingyi.ma at gmail.com> wrote:
> I have the following data set:
>
> Country Product Price Year_Month
> AE 1 20 201204
> DE 1 20 201204
> CN 1 28 201204
> AE 2 28 201204
> DE 2 28 201204
> CN 2 22 201204
> AE 3 28 201204
> CN 3 28 201204
> AE 1 20 201205
> DE 1 20 201205
> CN 1 28 201205
> AE 2 28 201205
> DE 2 28 201205
>
> I want to create the one more column which is "The average price of the
> product in other areas".
> in other word, for each month, for each product, I calculate the average of
> such product in the other area.
>
> I want sth like:
>
> Country Product Price Year_Month Price_average_In_Other_area
> AE 1 20 201204 14
> AE 2 28 201204 25
The output above looks wrong. The Price_average_In_Other_area for AE,
product 1 should be 24?
My possible solution:
# Initialize data.frame & call it "x".
Country <- c("AE","DE","CN","AE","DE","CN","AE","CN","AE","DE","CN","AE","DE");
Product <- c(1,1,1,2,2,2,3,3,1,1,1,2,2);
Price <- c(20,20,28,28,28,22,28,28,20,20,28,28,28);
Year_Month <- c(201204,201204,201204,201204,201204,201204,201204,201204,201205,201205,201205,201205,201205);
x <- data.frame(Country,Product,Price,Year_Month,stringsAsFactors=FALSE);
#
#
library("dplyr");
#
# Get the total Price of all Products and number of Products for each
Product & Year_Month"
y <- summarize(group_by(x, Product,
Year_Month),sumPrice=sum(Price),NoPrice=length(Price));
#
# Merge the above data back into the original data.frame, based on
# Product and Year_Month (similar to SQL inner join).
x <- merge(x=x,y=y);
#
# Now calculate the "other area" average by subtracting the cost in this area
# from the total cost in all areas and divide by the number of areas, minus one.
# Please note that if a Product and Year_Month is unique, i.e. no other areas
# for this Product & Year_Month, this will try to divide by zero.
# This gives "Inf" as an answer.
x$Prive_average_In_Other_area <- (x$sumPrice-x$Price)/(x$NoPrice-1);
# Possible alternate to handle above consideration
x$Avg_other <- ifelse(x$NoPrice>1,(x$sumPrice-x$Price)/(x$NoPrice-1),NA);
>
> Please avoid the three for loop, I have tried and it never end. I have
> 1070427 rows. Is there better way to speed up my program?
--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan
Maranatha! <><
John McKown
More information about the R-help
mailing list