[R] Advice/suggestions on using a stacked barplot with "zoo"?
David Wolfskill
david at catwhisker.org
Tue Feb 22 20:44:36 CET 2011
I'm not sure I'm using appropriate tools for what I'm trying to do; a
reason for my lack of certainty is that I'm encountering difficulty with
what I thought would be fairly simple: labeling the X axis of my graphs
with human-readable timestamps.
I'm using R 2.12.1, running in a FreeBSD 8.2-PRERELEASE r218945
environment.
As I mentioned in a previous note, I working with data collected via
sampling over a period of time. Each sample has a (unique) POSIXct
timestamp, and a couple hundred data. By the time I feed them to R,
they're in a format suitable for read.table(..., header = TRUE).
As each sample has an associated unique timestamp, and I'm investigating
relationships of the data over time (intending to identify correlations
between trends and with externally-observed behavior)), and it may
happen that a sample gets "missed," I'm using library(zoo). [Thank
you, Achim Zeileis, Gabor Grothendieck, and Ajay Shah!]
The samples may occur as frequently as once per second; unfortunately,
that's the current upper limit of precision of the sampling technique
I'm using. On the other hand, storing the timestamps as POSIXct is both
convenient and natural for this situation.
The bulk of the (other) data are numeric (though not quite all). Many
of the numeric data are "counters" (that is, the "interesting bit" is
the difference in value from one sample to another, vs. the magnitude of
the value itself); others are (in RRDtool parlance) "gauges" -- their
value in a given sample represents the state of the corresponding
environment, quite apart from any coirresponding values from other
samples.
One set of data that serve to illustrate what I'm doing is counters
for CPU state. The way this works is that the system receives a
periodic interrupt and uses the occasion of servicing this "statclock"
interrupt to sample the CPU state and increment a corresponding
counter.
At any given moment, the CPU is considered to be in one of 5 states:
* user
* nice
* system
* interrupt
* idle
There is a separate counter for each of these 5 states; at boot time,
each counter starts at 0, and at each statclock interrupt, precisely one
of these counters will be incremented (by one).
Given samples of each of these counters at each end of a suitable time
interval, it is then possible to determine the (mean) breakdown of CPU
usage during that interval by:
* Determining the corresponding differences for each of the above
states.
* Determine the total "statclock ticks" during the interval (by summing
the differences).
* Determine the proportion of time spent in each state by dividing the
difference for each state by the total (and multipyling by 100%, to get
a percentage, if that's desired -- and in my case, it is).
After using the data to populate a zoo, I generate another zoo of the
lagged differences (via diff()).
I then graph the result using a stacked barplot(). In my case, I choose
to first graph the "system" time, then "interrupt", then "user", then
"nice". I explicitly do not graph the "idle" time; thus, the top of the
graph represents total %CPU busy.
In doing this, I (belatedly) encountered prop.table(), which seems
appropriate for calculating the required totals -- but I seem to need to
feed it a matrix, rather than a zoo, then convert the result back to a
zoo. That's a level of awkwardness that seems suspect to me.
(Before I encountered prop.table(), I had cobbled up a function that
seemed to work, at least as much as I tested it (not much!); its use did
manage to avoid the coercion to matrux & back to zoo.)
Here are some samples:
> cpu_states
[1] "user" "nice" "sys" "intr" "idle"
> plot_states
[1] "sys" "intr" "user" "nice"
> CPU[1:10]
user nice sys intr idle
1298333405 28722903 25098 4900282 1809059 2811144985
1298333415 28722906 25098 4900289 1809059 2811155661
1298333425 28723842 25098 4900478 1809068 2811165209
1298333435 28726921 25098 4901270 1809077 2811172012
1298333445 28730078 25098 4902053 1809086 2811178746
1298333455 28732541 25098 4902176 1809099 2811186833
1298333465 28735600 25098 4902473 1809105 2811194152
1298333475 28737791 25098 4902654 1809105 2811202463
1298333485 28740727 25098 4902855 1809108 2811210006
1298333495 28741826 25098 4903096 1809109 2811219348
> CPUd <- diff(CPU)
> CPUd[1:10]
user nice sys intr idle
1298333415 3 0 7 0 10676
1298333425 936 0 189 9 9548
1298333435 3079 0 792 9 6803
1298333445 3157 0 783 9 6734
1298333455 2463 0 123 13 8087
1298333465 3059 0 297 6 7319
1298333475 2191 0 181 0 8311
1298333485 2936 0 201 3 7543
1298333495 1099 0 241 1 9342
1298333505 4401 0 390 4 5889
> prop.table(CPUd[1:10], 1)
Error in dn[[2L]] : subscript out of bounds
> prop.table(as.matrix(CPUd[1:10]), 1)
user nice sys intr idle
2 0.0002807412 0 0.0006550627 0.000000e+00 0.9990642
3 0.0876240404 0 0.0176933159 8.425389e-04 0.8938401
4 0.2882149209 0 0.0741364785 8.424600e-04 0.6368061
5 0.2955162408 0 0.0732940185 8.424600e-04 0.6303473
6 0.2304884896 0 0.0115103874 1.216545e-03 0.7567846
7 0.2863964048 0 0.0278063852 5.617452e-04 0.6852355
8 0.2050922026 0 0.0169428063 0.000000e+00 0.7779650
9 0.2748291678 0 0.0188149396 2.808200e-04 0.7060751
10 0.1028737246 0 0.0225592062 9.360666e-05 0.8744735
11 0.4119243729 0 0.0365031823 3.743916e-04 0.5511981
> prop.table(as.matrix(CPUd[1:10]), 1)*100
user nice sys intr idle
2 0.02807412 0 0.06550627 0.000000000 99.90642
3 8.76240404 0 1.76933159 0.084253885 89.38401
4 28.82149209 0 7.41364785 0.084245998 63.68061
5 29.55162408 0 7.32940185 0.084245998 63.03473
6 23.04884896 0 1.15103874 0.121654501 75.67846
7 28.63964048 0 2.78063852 0.056174515 68.52355
8 20.50922026 0 1.69428063 0.000000000 77.79650
9 27.48291678 0 1.88149396 0.028081999 70.60751
10 10.28737246 0 2.25592062 0.009360666 87.44735
11 41.19243729 0 3.65031823 0.037439161 55.11981
> zoo(prop.table(as.matrix(CPUd[1:10]), 1), index(CPUd))[1:10]*100
user nice sys intr idle
1298333415 0.02807412 0 0.06550627 0.000000000 99.90642
1298333425 8.76240404 0 1.76933159 0.084253885 89.38401
1298333435 28.82149209 0 7.41364785 0.084245998 63.68061
1298333445 29.55162408 0 7.32940185 0.084245998 63.03473
1298333455 23.04884896 0 1.15103874 0.121654501 75.67846
1298333465 28.63964048 0 2.78063852 0.056174515 68.52355
1298333475 20.50922026 0 1.69428063 0.000000000 77.79650
1298333485 27.48291678 0 1.88149396 0.028081999 70.60751
1298333495 10.28737246 0 2.25592062 0.009360666 87.44735
1298333505 41.19243729 0 3.65031823 0.037439161 55.11981
> barplot(zoo(prop.table(as.matrix(CPUd[, c(plot_states, "idle")]), 1)*100, index(CPUd))[, 1:4], border = NA, col = plot_colors, ylim = c(0, 100), space = 0, legend.text = plot_states, args.legend = c(x = "topleft", title = "CPU states"), ylab = "%CPU", xlab = "Time (seconds)", main = "CPU utilization during FreeBSD \"make -j12 buildworld\"")
And the resulting graph is mostly OK, but the X axis appears to be
a sequence of POSIXcts. I've tried various forms of evasive maneuvers
(mostly, specifying "..., axisnames = FALSE" in the barplot()
invocation, followed by invocations involving axis() subsequently)...
but I seem to be unable to get either human-readable time-of-day
representations ("%T") or a sequence showing the elapsed time from
the origin (either in seconds or minutes, for example).
So I'd appreciate suggestions for:
* Simplifying the approach -- I have difficulty believing that coercing
a zoo to a matrix and back again is likely to be sensible for
something as simple as what I'm trying to do.
* Making the X axis a bit more human-friendly.
Thanks!
Peace,
david
--
David H. Wolfskill david at catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.
See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110222/8d6a2672/attachment.bin>
More information about the R-help
mailing list