[R] Transforming simulation data which is spread across manyfiles into a barplot

Bert Gunter gunter.berton at gene.com
Fri Jun 11 21:02:55 CEST 2010

Ouch! Lousy plot. Instead, plot the  50 (mean sent, mean received)pairs as a
y vs x scatterplot to see the relationship. 

Bert Gunter
Genentech Nonclinical Biostatistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Hadley Wickham
Sent: Friday, June 11, 2010 11:53 AM
To: Ian Bentley
Cc: r-help at r-project.org
Subject: Re: [R] Transforming simulation data which is spread across
manyfiles into a barplot

On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:
> I'm an R newbie, and I'm just trying to use some of it's graphing
> capabilities, but I'm a bit stuck - basically in massaging the already
> available data into a format R likes.
> I have a simulation environment which produces logs, which represent a
> number of different things.  I then run a python script on this data, and
> putting it in a nicer format.  Essentially, the python script reduces the
> number of files by two orders of magnitude.
> What I'm left with, is a number of files, which each have two columns of
> data in them.
> The files look something like this:
> --1000.log--
> Sent Received
> 405.0 3832.0
> 176.0 1742.0
> 176.0 1766.0
> 176.0 1240.0
> 356.0 3396.0
> ...
> This file - called 1000.log - represents a data point at 1000. What I'd
> to do is to use a loop, to read in 50 or so of these files, and then
> a stacked barplot.  Ideally, the stacked barplot would have 1 bar per
> and two stacks per bar.  The first stack would be the mean of the sent,
> the second would be the mean of the received.
> I've used a loop to read files in R before, something like this ---
> for (i in 1:50){
>    tmpFile <- paste(base, i*100, ".log", sep="")
>    tmp <- read.table(tmpFile)
> }

# Load data

paths <- dir(base, pattern = "\\.log", full = TRUE)
names(paths) <- basename(paths)

df <- ddply(paths, read.table)

# Compute averages:
avg <- ddply(df, ".id", summarise,
  sent = mean(sent),
  received = mean(received)

You can read more about plyr at http://had.co.nz/plyr.


Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list