[R] Transforming simulation data which is spread across many files into a barplot

Hadley Wickham hadley at rice.edu
Fri Jun 11 20:52:56 CEST 2010


On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:
> I'm an R newbie, and I'm just trying to use some of it's graphing
> capabilities, but I'm a bit stuck - basically in massaging the already
> available data into a format R likes.
>
> I have a simulation environment which produces logs, which represent a
> number of different things.  I then run a python script on this data, and
> putting it in a nicer format.  Essentially, the python script reduces the
> number of files by two orders of magnitude.
>
> What I'm left with, is a number of files, which each have two columns of
> data in them.
> The files look something like this:
> --1000.log--
> Sent Received
> 405.0 3832.0
> 176.0 1742.0
> 176.0 1766.0
> 176.0 1240.0
> 356.0 3396.0
> ...
>
> This file - called 1000.log - represents a data point at 1000. What I'd like
> to do is to use a loop, to read in 50 or so of these files, and then produce
> a stacked barplot.  Ideally, the stacked barplot would have 1 bar per file,
> and two stacks per bar.  The first stack would be the mean of the sent, and
> the second would be the mean of the received.
>
> I've used a loop to read files in R before, something like this ---
>
> for (i in 1:50){
>    tmpFile <- paste(base, i*100, ".log", sep="")
>    tmp <- read.table(tmpFile)
> }
>

# Load data
library(plyr)

paths <- dir(base, pattern = "\\.log", full = TRUE)
names(paths) <- basename(paths)

df <- ddply(paths, read.table)

# Compute averages:
avg <- ddply(df, ".id", summarise,
  sent = mean(sent),
  received = mean(received)

You can read more about plyr at http://had.co.nz/plyr.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-help mailing list