[R] How to define proper breaks in RFM analysis

Mon Oct 23 07:02:57 CEST 2017

hello,
I'm confused what you guys are talking about.
i just want to set ideal threshold values for my RFM scores which can be
done using Quantiles but i don't want to use quantiles because my data is
not normally distributed so it will lead to wrong ranges of breaks. to fix
this problem I'm looking for an approach which can define the ideal range
to breaks to categorize RFM scores into 3 segments.
that's all i want.
THanks

On 14 October 2017 at 04:24, Jim Lemon <drjimlemon at gmail.com> wrote:

> Hemant's problem is that the indicators are not distributed uniformly.
> With a uniform distribution, categorization gives a reasonably optimal
> separation of cases. One approach would be to drop categorization and
> calculate the overall score as the mean of the standardized indicator
> scores. Whether this is an option I do not know. I did offer an
> "eyeball" set of breaks in a previous email, but apparently this was
> not sufficient.
>
> Jim
>
> On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius <dwinsemius at comcast.net>
> wrote:
> >
> >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at precheza.cz> wrote:
> >>
> >> Hi
> >>
> >> You expect us to solve your problem but you ignore advice already
> recieved.
> >>
> >> Your data are unreadable, use dput(yourdata) instead. see ?dput
> >>
> >>> test<-read.table("clipboard", heade=T)
> >> Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
> dec,  :
> >>  line 115 did not have 6 elements
> >
> > I didn't have such a problem: (illustrated with a more minimal example)
> >
> > dat <-  scan( what=list("",1,"",1L,1L,1),
> >              text="194849 6.99 8/22/2017 9 5 9.996
> > 194978 14.78 8/28/2017 3 15 16.308
> > 198614 18.44 7/31/2017 31 1 18.44
> > 234569 34.99 8/20/2017 11 8 13.5075
> > 252686 7.99 7/31/2017 31 2 7.99
> > 291719 21.26 8/25/2017 6 2 15.67
> > 291787 46.1 8/31/2017 0 2 32.57
> > 292630 24.34 7/31/2017 31 1 24.34
> > 295204 21.86 7/18/2017 44 1 21.86
> > 295989 8.98 8/20/2017 11 2 14.095
> > 298883 14.38 8/24/2017 7 2 11.185
> > 308824 10.77 7/31/2017 31 1 10.77")
> >
> > names(dat) <- c("user_id", "subtotal_amount", "created_at", "Recency",
> "Frequency", "Monetary")
> > dat <- data.frame(dat,stringsAsFactors=FALSE)
> >
> > I suspect read.table would also have worked for me, but I was expecting
> difficulties based on Petr's posting.
> >
> >
> > #And ended up with this result (on the original copied data):
> >> str(dat)
> > 'data.frame':   500 obs. of  6 variables:
> >  $ user_id        : chr  "194849" "194978" "198614" "234569" ...
> >  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
> >  $ created_at     : chr  "8/22/2017" "8/28/2017" "7/31/2017" "8/20/2017"
> ...
> >  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
> >  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
> >  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
> >
> > ...  but the following criticism seems, well, _critical_ (as in
> essential for one to address if a reasonable proposal is to be offered.)
> >
> >
> >> What is „ideal interval“ can you define it? Should it be such to
> provide eqal number of observations?
> >
> > That is the crucial question for you to answer, Hemant. Read the
> ?quartile help page if your answer is "yes" or even "maybe".
> >>
> >> Or maybe you could normalise your values and use quartile method.
> >
> > Well, maybe not so much on that last one, Petr. Normalization should not
> affect the classification based on quartiles. It doesn't change the
> ordering of variables.
> >
> > --
> > David.
> >
> >>
> >> Cheers
> >> Petr
> >>
> >> From: Hemant Sain [mailto:hemantsain55 at gmail.com]
> >> Sent: Friday, October 13, 2017 8:51 AM
> >> To: PIKAL Petr <petr.pikal at precheza.cz>
> >> Cc: r-help mailing list <r-help at r-project.org>
> >> Subject: Re: [R] How to define proper breaks in RFM analysis
> >>
> >> Hey,
> >> i want to define 3 ideal breaks (bin) for each variable one of those
> variables is attached in the previous email,
> >> i don't want to consider quartile method because quartile is not
> working ideally for that data set because data distribution is non normal.
> >> so i want you to suggest another method so that i can define 3 breaks
> with the ideal interval for Recency, frequency and monetary to calculate
> RFM score.
> >> i'm again attaching you some of the data set.
> >> please look into it and help me with the R code.
> >> Thanks
> >>
> >>
> >>
> >> Data
> >>
> >> user_id
> >>
> >> subtotal_amount
> >>
> >> created_at
> >>
> >> Recency
> >>
> >> Frequency
> >>
> >> Monetary
> >>
> >> 194849
> >>
> >> 6.99
> >>
> >> 8/22/2017
> >>
> > snipped
> >
> >>
> >>
> >> On 13 October 2017 at 10:35, PIKAL Petr <petr.pikal at precheza.cz<mailto:
> petr.pikal at precheza.cz>> wrote:
> >> Hi
> >>
> >> Your statement about attaching data is problematic. We cannot do much
> with it. Instead use output from dput(yourdata) to show us what exactly
> your data look like.
> >>
> >> We also do not know how do you want to split your data. It would be
> nice if you can show also what should be the bins with respective data.
> Unless you provide this information you probably would not get any sensible
> answer.
> >>
> >> Cheers
> >> Petr
> >>
> >>
> >>> -----Original Message-----
> >>> From: R-help [mailto:r-help-bounces at r-project.org<mailto:r-help-
> bounces at r-project.org>] On Behalf Of Hemant Sain
> >>> Sent: Thursday, October 12, 2017 10:18 AM
> >>> To: r-help mailing list <r-help at r-project.org<mailto:r
> -help at r-project.org>>
> >>> Subject: [R] How to define proper breaks in RFM analysis
> >>>
> >>> Hello,
> >>> I'm working on RFM analysis and i wanted to define my own breaks but my
> >>> frequency distribution is not normally distributed so when I'm using
> quartile its
> >>> not giving the optimal results.
> >>> so I'm looking for a better approach where i can define breaks
> dynamically
> >>> because after visualization i can do it easily but i want to apply
> this model so
> >>> that it can automatically define the breaks according to data set.
> >>> I'm attaching sample data for reference.
> >>>
> >>> Thanks
> >>>
> >>>                           *Freq*
> >>> 5
> >>> 15
> >>> 1
> > snipped
> >> .
> >>
> >>       [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

-- 
hemantsain.com

	[[alternative HTML version deleted]]