[R] queue waiting times comparison
Petr PIKAL
petr.pikal at precheza.cz
Thu Aug 18 16:12:57 CEST 2011
Hi Jim
>
> If those values represent response times in a system, then when I was
> responsible for characterizing what the system would do from the
> viewpoint of an SLA (service level agreement) with customers using the
> system, we usually specified that "90% of the transactions would have
> a response time of --- or less". This took care of most "long tails".
> So it depends on how you are planning to use this data. We usually
> monitored the 90th or 95th percentile to see how a system was
> operating day to day.
I get the point. This can be an option. I will discuss it with my
colleagues.
Thank you for your time and an answer.
Best regards
Petr
>
> On Thu, Aug 18, 2011 at 8:52 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
> > Hallo Jim
> >
> > Thank you and see within text.
> >
> > jim holtman <jholtman at gmail.com> napsal dne 18.08.2011 14:09:11:
> >
> >> I am not sure why you say that "lapply(ml, mean)" shows (incorrectly)
> >> that the second year has a larger average; it is correct for the
data:
> >>
> >> > lapply(ml, my.func)
> >> $y1
> >> Count Mean SD Min Median 90% 95%
> >> Max Sum
> >> 18.00000 16.83333 12.42980 4.00000 12.50000 37.20000 41.05000
> >> 47.00000 303.00000
> >>
> >> $y2
> >> Count Mean SD Min Median 90% 95%
> >> Max Sum
> >> 15.00000 20.06667 25.27694 4.00000 11.00000 45.80000 70.40000
> >> 97.00000 301.00000
> >>
> >>
> >> You have a larger "outlier" in the second year that causes the mean
to
> >> be higher. The median is lower, but I usually look at the 90th
> >> percentile if I am looking at response time from a system and again
> >> the second year has a higher value.
> >>
> >> So exactly why do you not "trust" your data?
> >
> > Well. I trust them, however mean is "correct" central value only when
data
> > are normally distributed or at least symmetrical. As the values are
> > heavily distorted I feel that I shall not use mean for comparison of
such
> > sets. Anyway t.test tells me that there is no difference between y2
and
> > y1.
> >
> >> t.test(ml[[1]], ml[[2]])
> >
> > Welch Two Sample t-test
> >
> > data: ml[[1]] and ml[[2]]
> > t = -0.452, df = 19.557, p-value = 0.6563
> > alternative hypothesis: true difference in means is not equal to 0
> > 95 percent confidence interval:
> > -18.17781 11.71115
> > sample estimates:
> > mean of x mean of y
> > 16.83333 20.06667
> >
> > So based on this I probably will never get conclusive result as sd due
to
> > "outliers" will be quite high.
> >
> > When I do
> > plot(ecdf(ml[[2]]))
> > plot(ecdf(ml[[1]]), add=T, col=2)
> >
> > it seems to me that both sets are almost the same and they differ
> > substantially only with those "outlier" values.
> >
> > If I decreased small values of y2 (e.g.)
> >
> > ml[[2]][ml[[2]]<20] <- ml[[2]][ml[[2]]<20]/2
> >
> > I get same mean
> >
> > lapply(ml, mean)
> > $y1
> > [1] 16.83333
> >
> > $y2
> > [1] 16.1
> >
> > and t.test tells me that there is no difference between those two
sets,
> > although I know that most events take half of the time and only few
last
> > longer so for me such set is better (we improved performance for most
of
> > the time however there are still scarce events which take a long
time).
> >
> > plot(ecdf(ml[[2]]))
> > plot(ecdf(ml[[1]]), add=T, col=2)
> >
> > So still the question stays - what procedure to use for comparison of
two
> > or more sets with such long tailed distribution? - Trimmed mean?,
Median?,
> > ...
> >
> > Thanks.
> >
> > Regards
> > Petr
> >
> >>
> >> On Thu, Aug 18, 2011 at 7:49 AM, Petr PIKAL <petr.pikal at precheza.cz>
> > wrote:
> >> > Hallo all
> >> >
> >> > I try to find a way how to compare set of waiting times during
> > different
> >> > periods. I tried learn something from queueing theory and used also
R
> >> > search. There is plenty of ways but I need to find the easiest and
> > quite
> >> > simple.
> >> > Here is a list with actual waiting times.
> >> >
> >> > ml <- structure(list(y1 = c(10, 9, 9, 10, 8, 20, 16, 47, 4, 7, 15,
> >> > 18, 36, 5, 24, 15, 40, 10), y2 = c(97, 10, 26, 11, 11, 10, 5,
> >> > 13, 19, 5, 5, 59, 4, 16, 10)), .Names = c("y1", "y2"))
> >> >
> >> > par(mfrow=c(1,2))
> >> > lapply(ml, hist)
> >> >
> >> > shows that in the first year is more longer waiting times
> >> >
> >> > lapply(ml, mean)
> >> >
> >> > shows (incorrectly) that in the second year there is longer average
> >> > waiting time.
> >> >
> >> > lapply(ml, mean)
> >> >
> >> > gives me completely reversed values.
> >> >
> >> > Can you please give me some hints what to use for "correct" and
> > "simple"
> >> > comparison of waiting times in two or more periods.
> >> >
> >> > Thank you
> >> > Petr
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >>
> >>
> >> --
> >> Jim Holtman
> >> Data Munger Guru
> >>
> >> What is the problem that you are trying to solve?
> >
> >
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
More information about the R-help
mailing list