[R] queue waiting times comparison
jim holtman
jholtman at gmail.com
Thu Aug 18 15:39:49 CEST 2011
If those values represent response times in a system, then when I was
responsible for characterizing what the system would do from the
viewpoint of an SLA (service level agreement) with customers using the
system, we usually specified that "90% of the transactions would have
a response time of --- or less". This took care of most "long tails".
So it depends on how you are planning to use this data. We usually
monitored the 90th or 95th percentile to see how a system was
operating day to day.
On Thu, Aug 18, 2011 at 8:52 AM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
> Hallo Jim
>
> Thank you and see within text.
>
> jim holtman <jholtman at gmail.com> napsal dne 18.08.2011 14:09:11:
>
>> I am not sure why you say that "lapply(ml, mean)" shows (incorrectly)
>> that the second year has a larger average; it is correct for the data:
>>
>> > lapply(ml, my.func)
>> $y1
>> Count Mean SD Min Median 90% 95%
>> Max Sum
>> 18.00000 16.83333 12.42980 4.00000 12.50000 37.20000 41.05000
>> 47.00000 303.00000
>>
>> $y2
>> Count Mean SD Min Median 90% 95%
>> Max Sum
>> 15.00000 20.06667 25.27694 4.00000 11.00000 45.80000 70.40000
>> 97.00000 301.00000
>>
>>
>> You have a larger "outlier" in the second year that causes the mean to
>> be higher. The median is lower, but I usually look at the 90th
>> percentile if I am looking at response time from a system and again
>> the second year has a higher value.
>>
>> So exactly why do you not "trust" your data?
>
> Well. I trust them, however mean is "correct" central value only when data
> are normally distributed or at least symmetrical. As the values are
> heavily distorted I feel that I shall not use mean for comparison of such
> sets. Anyway t.test tells me that there is no difference between y2 and
> y1.
>
>> t.test(ml[[1]], ml[[2]])
>
> Welch Two Sample t-test
>
> data: ml[[1]] and ml[[2]]
> t = -0.452, df = 19.557, p-value = 0.6563
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -18.17781 11.71115
> sample estimates:
> mean of x mean of y
> 16.83333 20.06667
>
> So based on this I probably will never get conclusive result as sd due to
> "outliers" will be quite high.
>
> When I do
> plot(ecdf(ml[[2]]))
> plot(ecdf(ml[[1]]), add=T, col=2)
>
> it seems to me that both sets are almost the same and they differ
> substantially only with those "outlier" values.
>
> If I decreased small values of y2 (e.g.)
>
> ml[[2]][ml[[2]]<20] <- ml[[2]][ml[[2]]<20]/2
>
> I get same mean
>
> lapply(ml, mean)
> $y1
> [1] 16.83333
>
> $y2
> [1] 16.1
>
> and t.test tells me that there is no difference between those two sets,
> although I know that most events take half of the time and only few last
> longer so for me such set is better (we improved performance for most of
> the time however there are still scarce events which take a long time).
>
> plot(ecdf(ml[[2]]))
> plot(ecdf(ml[[1]]), add=T, col=2)
>
> So still the question stays - what procedure to use for comparison of two
> or more sets with such long tailed distribution? - Trimmed mean?, Median?,
> ...
>
> Thanks.
>
> Regards
> Petr
>
>>
>> On Thu, Aug 18, 2011 at 7:49 AM, Petr PIKAL <petr.pikal at precheza.cz>
> wrote:
>> > Hallo all
>> >
>> > I try to find a way how to compare set of waiting times during
> different
>> > periods. I tried learn something from queueing theory and used also R
>> > search. There is plenty of ways but I need to find the easiest and
> quite
>> > simple.
>> > Here is a list with actual waiting times.
>> >
>> > ml <- structure(list(y1 = c(10, 9, 9, 10, 8, 20, 16, 47, 4, 7, 15,
>> > 18, 36, 5, 24, 15, 40, 10), y2 = c(97, 10, 26, 11, 11, 10, 5,
>> > 13, 19, 5, 5, 59, 4, 16, 10)), .Names = c("y1", "y2"))
>> >
>> > par(mfrow=c(1,2))
>> > lapply(ml, hist)
>> >
>> > shows that in the first year is more longer waiting times
>> >
>> > lapply(ml, mean)
>> >
>> > shows (incorrectly) that in the second year there is longer average
>> > waiting time.
>> >
>> > lapply(ml, mean)
>> >
>> > gives me completely reversed values.
>> >
>> > Can you please give me some hints what to use for "correct" and
> "simple"
>> > comparison of waiting times in two or more periods.
>> >
>> > Thank you
>> > Petr
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list