[R] How to define proper breaks in RFM analysis

Mon Oct 23 09:40:34 CEST 2017

Using quantiles does not imply assumption of normality, unless you drag that assumption in separately. Please go review statistics again, offlist, and come back when you need help with R.
-- 
Sent from my phone. Please excuse my brevity.

On October 22, 2017 10:02:57 PM PDT, Hemant Sain <hemantsain55 at gmail.com> wrote:
>hello,
>I'm confused what you guys are talking about.
>i just want to set ideal threshold values for my RFM scores which can
>be
>done using Quantiles but i don't want to use quantiles because my data
>is
>not normally distributed so it will lead to wrong ranges of breaks. to
>fix
>this problem I'm looking for an approach which can define the ideal
>range
>to breaks to categorize RFM scores into 3 segments.
>that's all i want.
>THanks
>
>
>On 14 October 2017 at 04:24, Jim Lemon <drjimlemon at gmail.com> wrote:
>
>> Hemant's problem is that the indicators are not distributed
>uniformly.
>> With a uniform distribution, categorization gives a reasonably
>optimal
>> separation of cases. One approach would be to drop categorization and
>> calculate the overall score as the mean of the standardized indicator
>> scores. Whether this is an option I do not know. I did offer an
>> "eyeball" set of breaks in a previous email, but apparently this was
>> not sufficient.
>>
>> Jim
>>
>> On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius
><dwinsemius at comcast.net>
>> wrote:
>> >
>> >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at precheza.cz>
>wrote:
>> >>
>> >> Hi
>> >>
>> >> You expect us to solve your problem but you ignore advice already
>> recieved.
>> >>
>> >> Your data are unreadable, use dput(yourdata) instead. see ?dput
>> >>
>> >>> test<-read.table("clipboard", heade=T)
>> >> Error in scan(file = file, what = what, sep = sep, quote = quote,
>dec =
>> dec,  :
>> >>  line 115 did not have 6 elements
>> >
>> > I didn't have such a problem: (illustrated with a more minimal
>example)
>> >
>> > dat <-  scan( what=list("",1,"",1L,1L,1),
>> >              text="194849 6.99 8/22/2017 9 5 9.996
>> > 194978 14.78 8/28/2017 3 15 16.308
>> > 198614 18.44 7/31/2017 31 1 18.44
>> > 234569 34.99 8/20/2017 11 8 13.5075
>> > 252686 7.99 7/31/2017 31 2 7.99
>> > 291719 21.26 8/25/2017 6 2 15.67
>> > 291787 46.1 8/31/2017 0 2 32.57
>> > 292630 24.34 7/31/2017 31 1 24.34
>> > 295204 21.86 7/18/2017 44 1 21.86
>> > 295989 8.98 8/20/2017 11 2 14.095
>> > 298883 14.38 8/24/2017 7 2 11.185
>> > 308824 10.77 7/31/2017 31 1 10.77")
>> >
>> > names(dat) <- c("user_id", "subtotal_amount", "created_at",
>"Recency",
>> "Frequency", "Monetary")
>> > dat <- data.frame(dat,stringsAsFactors=FALSE)
>> >
>> > I suspect read.table would also have worked for me, but I was
>expecting
>> difficulties based on Petr's posting.
>> >
>> >
>> > #And ended up with this result (on the original copied data):
>> >> str(dat)
>> > 'data.frame':   500 obs. of  6 variables:
>> >  $ user_id        : chr  "194849" "194978" "198614" "234569" ...
>> >  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
>> >  $ created_at     : chr  "8/22/2017" "8/28/2017" "7/31/2017"
>"8/20/2017"
>> ...
>> >  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
>> >  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
>> >  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
>> >
>> > ...  but the following criticism seems, well, _critical_ (as in
>> essential for one to address if a reasonable proposal is to be
>offered.)
>> >
>> >
>> >> What is „ideal interval“ can you define it? Should it be such to
>> provide eqal number of observations?
>> >
>> > That is the crucial question for you to answer, Hemant. Read the
>> ?quartile help page if your answer is "yes" or even "maybe".
>> >>
>> >> Or maybe you could normalise your values and use quartile method.
>> >
>> > Well, maybe not so much on that last one, Petr. Normalization
>should not
>> affect the classification based on quartiles. It doesn't change the
>> ordering of variables.
>> >
>> > --
>> > David.
>> >
>> >>
>> >> Cheers
>> >> Petr
>> >>
>> >> From: Hemant Sain [mailto:hemantsain55 at gmail.com]
>> >> Sent: Friday, October 13, 2017 8:51 AM
>> >> To: PIKAL Petr <petr.pikal at precheza.cz>
>> >> Cc: r-help mailing list <r-help at r-project.org>
>> >> Subject: Re: [R] How to define proper breaks in RFM analysis
>> >>
>> >> Hey,
>> >> i want to define 3 ideal breaks (bin) for each variable one of
>those
>> variables is attached in the previous email,
>> >> i don't want to consider quartile method because quartile is not
>> working ideally for that data set because data distribution is non
>normal.
>> >> so i want you to suggest another method so that i can define 3
>breaks
>> with the ideal interval for Recency, frequency and monetary to
>calculate
>> RFM score.
>> >> i'm again attaching you some of the data set.
>> >> please look into it and help me with the R code.
>> >> Thanks
>> >>
>> >>
>> >>
>> >> Data
>> >>
>> >> user_id
>> >>
>> >> subtotal_amount
>> >>
>> >> created_at
>> >>
>> >> Recency
>> >>
>> >> Frequency
>> >>
>> >> Monetary
>> >>
>> >> 194849
>> >>
>> >> 6.99
>> >>
>> >> 8/22/2017
>> >>
>> > snipped
>> >
>> >>
>> >>
>> >> On 13 October 2017 at 10:35, PIKAL Petr
><petr.pikal at precheza.cz<mailto:
>> petr.pikal at precheza.cz>> wrote:
>> >> Hi
>> >>
>> >> Your statement about attaching data is problematic. We cannot do
>much
>> with it. Instead use output from dput(yourdata) to show us what
>exactly
>> your data look like.
>> >>
>> >> We also do not know how do you want to split your data. It would
>be
>> nice if you can show also what should be the bins with respective
>data.
>> Unless you provide this information you probably would not get any
>sensible
>> answer.
>> >>
>> >> Cheers
>> >> Petr
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: R-help [mailto:r-help-bounces at r-project.org<mailto:r-help-
>> bounces at r-project.org>] On Behalf Of Hemant Sain
>> >>> Sent: Thursday, October 12, 2017 10:18 AM
>> >>> To: r-help mailing list <r-help at r-project.org<mailto:r
>> -help at r-project.org>>
>> >>> Subject: [R] How to define proper breaks in RFM analysis
>> >>>
>> >>> Hello,
>> >>> I'm working on RFM analysis and i wanted to define my own breaks
>but my
>> >>> frequency distribution is not normally distributed so when I'm
>using
>> quartile its
>> >>> not giving the optimal results.
>> >>> so I'm looking for a better approach where i can define breaks
>> dynamically
>> >>> because after visualization i can do it easily but i want to
>apply
>> this model so
>> >>> that it can automatically define the breaks according to data
>set.
>> >>> I'm attaching sample data for reference.
>> >>>
>> >>> Thanks
>> >>>
>> >>>                           *Freq*
>> >>> 5
>> >>> 15
>> >>> 1
>> > snipped
>> >> .
>> >>
>> >>       [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > David Winsemius
>> > Alameda, CA, USA
>> >
>> > 'Any technology distinguishable from magic is insufficiently
>advanced.'
>>  -Gehm's Corollary to Clarke's Third Law
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>-- 
>hemantsain.com
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.