[R] Dataframe Manipulation

Ulrik Stervbo ulrik.stervbo at gmail.com
Thu Aug 31 09:47:39 CEST 2017


Hi Hemant,

the solution is really quite similar, and the logic is identical:

library(readr)
library(dplyr)
library(stringr)
library(tidyr)

data_help <- read_csv("data_help.csv")
cat_help <- read_csv("cat_help.csv")

# Helper function to split the Items and create a data_frame
split_items <- function(items){
  x <- items$Items_purchased_on_Receipts %>%
    str_split(pattern = ",") %>%
    unlist(use.names = FALSE)

  data_frame(Item = x, Purchase_ID = items$Purchase_ID)
}

data_help <-
  data_help %>%
  mutate(Purchase_ID = 1:n()) %>%
  group_by(Purchase_ID) %>%
    do(split_items(.))

cat_help %>% gather("Foo", "Item") %>%
  filter(!is.na(Item)) %>%
    left_join(data_help, by = "Item") %>%
  group_by(Foo, Purchase_ID) %>%
  summarise(Item = paste(Item, collapse = ", ")) %>%
  spread(key = "Foo", value = "Item")

HTH
Ulrik

On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsain55 at gmail.com> wrote:

> by using these two tables we have to create third table in this format
> where categories will be on the top and transaction will be in the rows,
>
> On 30 August 2017 at 16:42, Hemant Sain <hemantsain55 at gmail.com> wrote:
>
>> Hello Ulrik,
>> Can you please once check this code again on the following data set
>> because it doesn't giving same output to me due to absence of quantity,a
>> compare to previous demo data set becaue spiting is getting done on the
>> basis of quantity and in real data set quantity is missing. so please use
>> following data set and help me out please consider this mail is my final
>> email i won't bother you again but its about my job please help me
>> .
>>
>> Note* the file I'm attaching is very confidential
>>
>> On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.stervbo at gmail.com>
>> wrote:
>>
>>> Hi Hemant,
>>>
>>> Does this help you along?
>>>
>>> table_1 <- textConnection("Item_1;Item_2;Item_3
>>> 1KG banana;300ML milk;1kg sugar
>>> 2Large Corona_Beer;2pack Fries;
>>> 2 Lux_Soap;1kg sugar;")
>>>
>>> table_1 <- read.csv(table_1, sep = ";", na.strings = "",
>>> stringsAsFactors = FALSE, check.names = FALSE)
>>>
>>> table_2 <-
>>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
>>> Products
>>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
>>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
>>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
>>> M;sugar
>>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
>>>
>>> table_2 <- read.csv(table_2, sep = ";", na.strings = "",
>>> stringsAsFactors = FALSE, check.names = FALSE)
>>>
>>> library(tidyr)
>>> library(dplyr)
>>>
>>> table_2 <- gather(table_2, "Category", "Item")
>>>
>>> table_1 <- gather(table_1, "Foo", "Item") %>%
>>>   filter(!is.na(Item))
>>>
>>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"),
>>> sep = " ")
>>>
>>> table_3 <- left_join(table_1, table_2, by = "Item") %>%
>>>   mutate(Item = paste(Quantity, Item)) %>%
>>>   select(-Quantity)
>>>
>>> table_3 %>%
>>>   group_by(Foo, Category) %>%
>>>   summarise(Item = paste(Item, collapse = ", ")) %>%
>>>   spread(key = "Category", value = "Item")
>>>
>>> You need to figure out how to handle words written with different cases
>>> and how to get the quantity in an universal way. For the code above, I
>>> corrected these things by hand in the example data.
>>>
>>> HTH
>>> Ulrik
>>>
>>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at gmail.com> wrote:
>>>
>>>> Hey PIKAL,
>>>> It's not a homework neithe that is the real dataset i have signer NDA
>>>> for
>>>> my company so that i can share the original data file, Actually I'm
>>>> working
>>>> on a market basket analysis task but not able to convert my existing
>>>> data
>>>> table to appropriate format so that i can apply Apriori algorithm using
>>>> R,
>>>> and this is very important me to get it done because I'm an intern and
>>>> if i
>>>> won't get it done they will not  going to hire me as a full-time
>>>> employee.
>>>> i tried everything by myself but not able to get it done.
>>>> your precious 10-15 can save my upcoming years. so please if you can
>>>> please
>>>> help me through this.
>>>> i want another dataset based on first two dataset i have mentioned .
>>>>
>>>> Thanks
>>>>
>>>> On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz> wrote:
>>>>
>>>> > Hi
>>>> >
>>>> > It seems to me like homework, there is no homework policy on this help
>>>> > list.
>>>> >
>>>> > What do you want to do with your table 3? It seems to me futile.
>>>> >
>>>> > Anyway, some combination of melt, merge, cast and regular expressions
>>>> > could be employed in such task, but it could be rather tricky.
>>>> >
>>>> > But be aware that
>>>> >
>>>> > Suger does not match sugar (I wonder that sugar is dairy product)
>>>> >
>>>> > and you mix uppercase and lowercase letters which could be also
>>>> > problematic, when matching words.
>>>> >
>>>> > Cheers
>>>> > Petr
>>>> >
>>>> > > -----Original Message-----
>>>> > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
>>>> Hemant
>>>> > Sain
>>>> > > Sent: Wednesday, August 30, 2017 8:28 AM
>>>> > > To: r-help at r-project.org
>>>> > > Subject: [R] Dataframe Manipulation
>>>> > >
>>>> > > i want to do a market basket analysis and I’m trying to create a
>>>> dataset
>>>> > for that
>>>> > > i have two tables, one table contains daily transaction of products
>>>> in
>>>> > which
>>>> > > each row of table shows item purchased by the customer, The second
>>>> table
>>>> > > contains parent group under those products are fallen, for example
>>>> under
>>>> > fruit
>>>> > > category there are several fruits like mango, banana, apple etc.
>>>> > > i want to create a third table in which parent group are mentioned
>>>> as
>>>> > header
>>>> > > which can be extracted from Table 2, and all the rows represent
>>>> > transaction of
>>>> > > products
>>>> > >
>>>> > > with their names, and if there is no transaction for any parent
>>>> category
>>>> > then
>>>> > > the cell supposed to fill as NA. please help me with R or C/c++
>>>> code( R
>>>> > would be
>>>> > >
>>>> > > preferred) here I’m attaching you all three tables for better
>>>> reference
>>>> > i have
>>>> > > first two tables and i want to get a table like table 3
>>>> > >
>>>> > > Tables are explained in the attached doc.
>>>> > >
>>>> > > --
>>>> > > hemantsain.com
>>>> >
>>>> > ________________________________
>>>> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
>>>> jsou
>>>> > určeny pouze jeho adresátům.
>>>> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
>>>> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
>>>> kopie
>>>> > vymažte ze svého systému.
>>>> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
>>>> email
>>>> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
>>>> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
>>>> modifikacemi
>>>> > či zpožděním přenosu e-mailu.
>>>> >
>>>> > V případě, že je tento e-mail součástí obchodního jednání:
>>>> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
>>>> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
>>>> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
>>>> přijmout;
>>>> > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
>>>> > příjemce s dodatkem či odchylkou.
>>>> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
>>>> > výslovným dosažením shody na všech jejích náležitostech.
>>>> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
>>>> > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
>>>> zmocněn
>>>> > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi
>>>> tohoto
>>>> > emailu případně osobě, kterou adresát zastupuje, předloženy nebo
>>>> jejich
>>>> > existence je adresátovi či osobě jím zastoupené známá.
>>>> >
>>>> > This e-mail and any documents attached to it may be confidential and
>>>> are
>>>> > intended only for its intended recipients.
>>>> > If you received this e-mail by mistake, please immediately inform its
>>>> > sender. Delete the contents of this e-mail with all attachments and
>>>> its
>>>> > copies from your system.
>>>> > If you are not the intended recipient of this e-mail, you are not
>>>> > authorized to use, disseminate, copy or disclose this e-mail in any
>>>> manner.
>>>> > The sender of this e-mail shall not be liable for any possible damage
>>>> > caused by modifications of the e-mail or by delay with transfer of the
>>>> > email.
>>>> >
>>>> > In case that this e-mail forms part of business dealings:
>>>> > - the sender reserves the right to end negotiations about entering
>>>> into a
>>>> > contract in any time, for any reason, and without stating any
>>>> reasoning.
>>>> > - if the e-mail contains an offer, the recipient is entitled to
>>>> > immediately accept such offer; The sender of this e-mail (offer)
>>>> excludes
>>>> > any acceptance of the offer on the part of the recipient containing
>>>> any
>>>> > amendment or variation.
>>>> > - the sender insists on that the respective contract is concluded only
>>>> > upon an express mutual agreement on all its aspects.
>>>> > - the sender of this e-mail informs that he/she is not authorized to
>>>> enter
>>>> > into any contracts on behalf of the company except for cases in which
>>>> > he/she is expressly authorized to do so in writing, and such
>>>> authorization
>>>> > or power of attorney is submitted to the recipient or the person
>>>> > represented by the recipient, or the existence of such authorization
>>>> is
>>>> > known to the recipient of the person represented by the recipient.
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> hemantsain.com
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>> --
>> hemantsain.com
>>
>
>
>
> --
> hemantsain.com
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list