# [R] Relative Cumulative Frequency of Event Occurence

arun smartpink111 at yahoo.com
Fri Nov 29 18:56:11 CET 2013

```Hi Burhan,

No problem.  One suggestion in this code would be:
with(df.1, cumsum(E.Occur==TRUE)/(seq_len(nrow(df.1))))  ##==TRUE is not needed
identical( with(df.1, cumsum(E.Occur)/(seq_len(nrow(df.1)))),   with(df.1, cumsum(E.Occur==TRUE)/(seq_len(nrow(df.1)))) )

is.logical(TRUE)
#[1] TRUE

is.logical("Yes")
#[1] FALSE
A.K.

On Friday, November 29, 2013 12:36 PM, Burhan ul haq <ulhaqz at gmail.com> wrote:

Hi Arun,

Thanks a lot. It works perfectly.

Here is the complete code - for all those who are interested to see "Rel Cum Freq oscillating to reach the Expected Value"

# Bernouilli Trial where:
v.fly=c("G","B") # Outcome is Green or Blue fly
n=100 # No of Events / Trials
v.smp = seq(1:n) # Event Id
v.fst = sample(v.fly,n,rep=T) # Simulating First Draw
v.sec = sample(v.fly,n,rep=T)  # Simulating Second Draw
df.1 = data.frame(sample = v.smp, fst=v.fst, sec = v.sec) # Clumping in a DF
df.1\$E.Occur = with(df.1, ifelse(fst==sec,TRUE,FALSE)) # Event Occurs, if color is same in both the the draws
df.1\$Rel.Freq = with(df.1, cumsum(E.Occur==TRUE)/(seq_len(nrow(df.1)))) # Relative Frequency
df.1\$Rel.Freq = round(df.1\$Rel.Freq,2)

ggplot(df.1, aes(x=sample,y=Rel.Freq))+geom_line(col="green",size=2)+geom_abline(intercept=0.5,slope=0)+geom_point(col="blue")+labs(x="Sample No",y="Relative Cum Freq",title="Rel Cum Freq approaching 0.5 Value") + annotate("text",x=60,y=0.53,label="Probability of 0.5")

Cheers !

On Thu, Nov 28, 2013 at 9:40 PM, arun <smartpink111 at yahoo.com> wrote:

HI,
>From the dput() version of df.1, it looks like you want:
>cumsum(df.1[,4]=="Yes")/seq_len(nrow(df.1))
> [1] 0.0000000 0.5000000 0.3333333 0.2500000 0.4000000 0.3333333 0.4285714
> [8] 0.5000000 0.4444444 0.5000000
>
>
>A.K.
>
>
>
>On Thursday, November 28, 2013 11:26 AM, Burhan ul haq <ulhaqz at gmail.com> wrote:
>Hi,
>
>My objective is to calculate "Relative (Cumulative) Frequency of Event
>Occurrence" - something as follows:
>
>Sample.Number 1st.Fly 2nd.Fly  Did.E.occur? Relative.Cum.Frequency.of.E
>1 G B No 0.000
>2 B B Yes 0.500
>3 B G No 0.333
>4 G B No 0.250
>5 G G Yes 0.400
>6 G B No 0.333
>7 B B Yes 0.429
>8 G G Yes 0.500
>9 G B No 0.444
>10 B B Yes 0.500
>
>Please refer to the code below:
>##############################################################
># 1.
>v.fly=c("G","B") # Outcome is Green or Blue fly
>
># 2.
>n=10 # No of Events / Trials
>
># 3.
>v.smp = seq(1:n) # Event Id
>
># 4.
>v.fst = sample(v.fly,n,rep=T) # Simulating First Draw
>
># 5.
>v.sec = sample(v.fly,n,rep=T)  # Simulating Second Draw
>
># 6.
>df.1 = data.frame(sample = v.smp, fst=v.fst, sec = v.sec) # Clumping in a DF
>
># 7.
>df.1\$E.Occur = with(df.1, ifelse(fst==sec,TRUE,FALSE)) # Event Occurs, if
>color is same in both the the draws
>
># 8.
>df.1\$Rel.Freq = with(df.1, cumsum(E.occur)/(E.Occur)) # Relative Frequency
>>> This line does NOT work, and needs to fix the denominator part
>##############################################################
>
>Problem is with #8, specifically the part:
>cumsum(E.occur)/(E.Occur)
>
>The denominator E.Occur is a fixed value, instead of a moving count. I have
>tried nrow(), length() but none provides a moving version of row count, as
>cumsum does for the "True" values, occurring so far.
>
>> dput(df.1)
>structure(list(Sample.Number = 1:10, X1st.Fly = c("G", "B", "B",
>"G", "G", "G", "B", "G", "G", "B"), X2nd.Fly = c("B", "B", "G",
>"B", "G", "B", "B", "G", "B", "B"), Did.E.occur. = c("No", "Yes",
>"No", "No", "Yes", "No", "Yes", "Yes", "No", "Yes"),
>Relative.Cum.Frequency.of.E = c(0,
>0.5, 0.333, 0.25, 0.4, 0.333, 0.429, 0.5, 0.444, 0.5)), .Names =
>c("Sample.Number",
>"X1st.Fly", "X2nd.Fly", "Did.E.occur.", "Relative.Cum.Frequency.of.E"
>), class = "data.frame", row.names = c(NA, -10L))
>
>
>Cheers !
>
>    [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help