[R] Dataframe Manipulation

Ulrik Stervbo ulrik.stervbo at gmail.com
Wed Aug 30 11:32:23 CEST 2017


Hi Hemant,

Does this help you along?

table_1 <- textConnection("Item_1;Item_2;Item_3
1KG banana;300ML milk;1kg sugar
2Large Corona_Beer;2pack Fries;
2 Lux_Soap;1kg sugar;")

table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors =
FALSE, check.names = FALSE)

table_2 <-
textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
Products
Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar
Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")

table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors =
FALSE, check.names = FALSE)

library(tidyr)
library(dplyr)

table_2 <- gather(table_2, "Category", "Item")

table_1 <- gather(table_1, "Foo", "Item") %>%
  filter(!is.na(Item))

table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"),
sep = " ")

table_3 <- left_join(table_1, table_2, by = "Item") %>%
  mutate(Item = paste(Quantity, Item)) %>%
  select(-Quantity)

table_3 %>%
  group_by(Foo, Category) %>%
  summarise(Item = paste(Item, collapse = ", ")) %>%
  spread(key = "Category", value = "Item")

You need to figure out how to handle words written with different cases and
how to get the quantity in an universal way. For the code above, I
corrected these things by hand in the example data.

HTH
Ulrik

On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at gmail.com> wrote:

> Hey PIKAL,
> It's not a homework neithe that is the real dataset i have signer NDA for
> my company so that i can share the original data file, Actually I'm working
> on a market basket analysis task but not able to convert my existing data
> table to appropriate format so that i can apply Apriori algorithm using R,
> and this is very important me to get it done because I'm an intern and if i
> won't get it done they will not  going to hire me as a full-time employee.
> i tried everything by myself but not able to get it done.
> your precious 10-15 can save my upcoming years. so please if you can please
> help me through this.
> i want another dataset based on first two dataset i have mentioned .
>
> Thanks
>
> On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz> wrote:
>
> > Hi
> >
> > It seems to me like homework, there is no homework policy on this help
> > list.
> >
> > What do you want to do with your table 3? It seems to me futile.
> >
> > Anyway, some combination of melt, merge, cast and regular expressions
> > could be employed in such task, but it could be rather tricky.
> >
> > But be aware that
> >
> > Suger does not match sugar (I wonder that sugar is dairy product)
> >
> > and you mix uppercase and lowercase letters which could be also
> > problematic, when matching words.
> >
> > Cheers
> > Petr
> >
> > > -----Original Message-----
> > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Hemant
> > Sain
> > > Sent: Wednesday, August 30, 2017 8:28 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Dataframe Manipulation
> > >
> > > i want to do a market basket analysis and I’m trying to create a
> dataset
> > for that
> > > i have two tables, one table contains daily transaction of products in
> > which
> > > each row of table shows item purchased by the customer, The second
> table
> > > contains parent group under those products are fallen, for example
> under
> > fruit
> > > category there are several fruits like mango, banana, apple etc.
> > > i want to create a third table in which parent group are mentioned as
> > header
> > > which can be extracted from Table 2, and all the rows represent
> > transaction of
> > > products
> > >
> > > with their names, and if there is no transaction for any parent
> category
> > then
> > > the cell supposed to fill as NA. please help me with R or C/c++ code( R
> > would be
> > >
> > > preferred) here I’m attaching you all three tables for better reference
> > i have
> > > first two tables and i want to get a table like table 3
> > >
> > > Tables are explained in the attached doc.
> > >
> > > --
> > > hemantsain.com
> >
> > ________________________________
> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> > určeny pouze jeho adresátům.
> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
> kopie
> > vymažte ze svého systému.
> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> email
> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> > či zpožděním přenosu e-mailu.
> >
> > V případě, že je tento e-mail součástí obchodního jednání:
> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
> přijmout;
> > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> > příjemce s dodatkem či odchylkou.
> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> > výslovným dosažením shody na všech jejích náležitostech.
> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
> zmocněn
> > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi
> tohoto
> > emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> > existence je adresátovi či osobě jím zastoupené známá.
> >
> > This e-mail and any documents attached to it may be confidential and are
> > intended only for its intended recipients.
> > If you received this e-mail by mistake, please immediately inform its
> > sender. Delete the contents of this e-mail with all attachments and its
> > copies from your system.
> > If you are not the intended recipient of this e-mail, you are not
> > authorized to use, disseminate, copy or disclose this e-mail in any
> manner.
> > The sender of this e-mail shall not be liable for any possible damage
> > caused by modifications of the e-mail or by delay with transfer of the
> > email.
> >
> > In case that this e-mail forms part of business dealings:
> > - the sender reserves the right to end negotiations about entering into a
> > contract in any time, for any reason, and without stating any reasoning.
> > - if the e-mail contains an offer, the recipient is entitled to
> > immediately accept such offer; The sender of this e-mail (offer) excludes
> > any acceptance of the offer on the part of the recipient containing any
> > amendment or variation.
> > - the sender insists on that the respective contract is concluded only
> > upon an express mutual agreement on all its aspects.
> > - the sender of this e-mail informs that he/she is not authorized to
> enter
> > into any contracts on behalf of the company except for cases in which
> > he/she is expressly authorized to do so in writing, and such
> authorization
> > or power of attorney is submitted to the recipient or the person
> > represented by the recipient, or the existence of such authorization is
> > known to the recipient of the person represented by the recipient.
> >
>
>
>
> --
> hemantsain.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list