[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

Wed Aug 14 23:03:17 CEST 2013

I'm not sure I follow you exactly so let's start with some data
and one graph and move on from there:

First the data (I'm assuming you don't have A to A so you really
want 3 lines on a graph)?

set.seed(42)
pairs <- structure(list(From = structure(c(1L, 1L, 1L, 2L, 2L,
2L, 3L, 
3L, 3L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class =
"factor"), 
    To = structure(c(2L, 3L, 4L, 1L, 3L, 4L, 1L, 2L, 4L, 1L, 
    2L, 3L), .Label = c("A", "B", "C", "D"), class = "factor")),

    .Names = c("From", "To"), class = "data.frame", row.names =
c(NA, -12L))
net <- data.frame(pairs[sample.int(12, 1000, replace=TRUE),], 
	Time=rnorm(1000, .2, .05))

Now generate one plot:

plot(density(net$Time[net$From=="A" & net$To=="B"]), xlim=c(0,
.4), 
	ylim=c(0, 8), main="From A")
lines(density(net$Time[net$From=="A" & net$To=="C"]), lty=2)
lines(density(net$Time[net$From=="A" & net$To=="D"]), lty=3)
legend("topright", c("B", "C", "D"), lty=1:3)

Is this on the right track?

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Jack Challen
Sent: Wednesday, August 14, 2013 10:23 AM
To: r-help at r-project.org
Subject: [R] Producing multiple analyses (histograms/kernel
densities) of network timings between groups

(This is a repost from a little while ago. I assume my mail got
silently bounced because I used some rather strange email
routing. If it did get through, and I simply haven't seen it or
a response, then please accept my apologies)
Hi,

I'm new to R, and new to statistics. I'm *trying* to learn R,
but I'm struggling with the R-intro, mainly (I think) due to the
fact that I have no background in stats, and some of the
language is unfamiliar to me (I started with C and Perl, mainly)
so I might use the wrong terms. I think the "R in action" book
might help, but recommendations are welcome.

I have a whole bunch of network timings (ICMP echos) between
different groups of nodes using two different networks. I want
to compare the timings between the groups and across networks,
as I /believe/ that one network has much greater variability
than the other. I want to prove this, one way or the other, and
I think a graphical view of the ~20000 results would help. The
initial histograms/kernel densities I've produced so far support
that theory (i.e. they look a bit like the Normal distribution,
but one network is much more "stretched out" and "bumpy"), but
I've resorted to pre-processing that data in Perl in order to
produce the graphs. I think R can be used to do all of this in
one.

For each network, I have files like this:

===
RoomA RoomB 0.34
RoomC RoomA 0.12
RoomB RoomA 0.12
===

The columns are: From, To, and Time taken. There are 4 rooms in
total.
The data's unsorted, and there will be multiple pairs (i.e. I
haven't done de-duplication of pairings via the handshake
algorithm, I just pinged everything from everything). There will
be multiple entries for each pairing.

The graphs I think I want to produce are:

For "From RoomA", overlay each timing graph for every other
room. That means there will be 4 kernel densities (well actually
I'd take a histogram plotted as a line, as I think that's more
appropriate, and I don't know what a kernel density is) on one
graph.
I'd also like to do the above for "From RoomB", "From RoomC",
and "From RoomD", so I'd end up with with 4 graphs (all with the
same xlim/ylim) each with 4 lines plotted. I'd eventually like
those produced as vector Postscript for inclusion in a report,
but I think I can handle that with ?postscript() and ?layout()

I've got as far as importing the data with
read.table("eth_ping_timings.dat", col.names=c("From", "To",
"Time"))
Then I can do "standard" simple operations on Foo$Time.
"Factoring" (if that is indeed the term) is where I fall down. I
simply don't know how to break out the pairings.

Is R actually the way to go for this? I feel pretty confident I
could cobble together some Perl which produces Postscript to
describe the curves, but I suspect that once I produce what
these graphs, I will immediately think of other questions to
ask, and R sounds like it's the proper tool to ask those
questions.

cheers
jack

________________________________

This email and any files transmitted with it are
confide...{{dropped:10}}

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.