# [R] Comparison of the amount of computation

Petr Savicky savicky at praha1.ff.cuni.cz
Thu Apr 14 08:58:44 CEST 2011

```On Wed, Apr 13, 2011 at 04:12:39PM -0700, helin_susam wrote:
> Hi dear list,
>
> I want to compare the amount of computation of two functions. For example,
> by using this algorithm;
>
> data <- rnorm(n=100, mean=10, sd=3)
>
> output1 <- list ()
> for(i in 1:100) {
> data1 <- sample(100, 100, replace = TRUE)
> statistic1 <- mean(data1)
> output1 <- c(output1, list(statistic1))
> }
> output1
>
> output2 <- list()
> for(i in 1:100) {
> data2 <- unique(sample(100, 100, replace=TRUE))
> statistic2 <- mean(data2)
> output2 <- c(output2, list(statistic2))
> }
> output2
>
> data1 consists of exactly 100 elements, but data2 consists of roughly 55 or
> 60 elements. So, to get statistic1, for each sample, 100 data points are
> used. But, to get statistic2 roughly half of them are used.
> I want to proof this difference. Is there any way to do this ?

Hi.

Every number from 1:100 has probability 1 - (1 - 1/100)^100 = 0.6339677
to appear in sample(100, 100, replace=TRUE). So, the expected length
of data2 is 63.39677. If you want to estimate the distribution of the
lengths of data2 using a simulation, then record length(data2). For
example

n <- 10000
s <- rep(NA, times=n)
for (i in 1:n) {
s[i] <- length(unique(sample(100, 100, replace=TRUE)))
}
cbind(table(s))

I obtained

[,1]
53    5
54   16
55   27
56   82
57  165
58  294
59  465
60  672
61  970
62 1168
63 1283
64 1303
65 1111
66  882
67  626
68  435
69  250
70  143
71   57
72   27
73   14
74    5

In this case, mean(sample(100, 100, replace=TRUE)) and
mean(unique(sample(100, 100, replace=TRUE))) have the same
expected value 50.5. However, eliminating repeated values may,
in general, change the expected value of the sample mean.

Hope this helps.

Petr Savicky.

```