[R] Combining a list of similar dataframes into a single data frame [Broadcast]

Sun Jul 9 02:50:00 CEST 2006

A couple of suggestions:

1. This screams out for do.call.  Try jj <- do.call("rbind", t1).
2. Use rowSums() instead of apply(..., 1, sum).

Andy

  _____  

From: r-help-bounces at stat.math.ethz.ch on behalf of Mike Nielsen
Sent: Sat 7/8/2006 7:20 PM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] Combining a list of similar dataframes into a single
dataframe [Broadcast]

Well, this worked, and rather more quickly than I had expected. 

Many thanks to the dogs, who told me the answer in return for walking 
them and feeding them! 

> jj <- eval(parse(text=paste(sep=" ","rbind(",paste(sep="
","t1[[",1:length(t1),"]]",collapse=","),")"))) 
> str(jj) 
`data.frame':   85644 obs. of  4 variables: 
 $ server      : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1 
1 1 1 1 1 1 ... 
 $ ts          :'POSIXct', format: chr  "2006-06-30 12:31:44" 
"2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30 12:36:55" ... 
 $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1 
1 1 1 1 1 ... 
 $ countervalue: num    NA  938  816 4213  906 ... 
> 

On 7/8/06, Mike Nielsen <mr.blacksheep at gmail.com> wrote: 
> I would be very grateful to anyone who could point to the error of my 
> ways in the following. 
> 
> I have a dataframe called net1, as such: 
> 
> > str(net1) 
> `data.frame':    114192 obs. of  9 variables: 
>  $ server         : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 
> 1 1 1 1 1 1 1 ... 
>  $ ts             :'POSIXct', format: chr  "2006-06-30 12:31:44" 
> "2006-06-30 12:31:44" "2006-06-30 12:31:44" "2006-06-30 12:31:44" ... 
>  $ instance       : Factor w/ 22 levels "1","2","Compaq Ethernet_Fast 
> Ethernet Adapter_Module",..: 4 4 4 4 4 4 4 4 4 4 ... 
>  $ instanceno     : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1
... 
>  $ perftime       : num  3.16e+13 3.16e+13 3.16e+13 3.16e+13 3.16e+13 ... 
>  $ perffreq       : num  6.99e+08 6.99e+08 6.99e+08 6.99e+08 6.99e+08 ... 
>  $ perftime100nsec: num  1.28e+17 1.28e+17 1.28e+17 1.28e+17 1.28e+17 ... 
>  $ countername    : Factor w/ 4 levels "Bytes Received/sec",..: 1 3 2 
> 4 1 3 2 4 1 3 ... 
>  $ countervalue   : num  6.08e+07 6.64e+07 5.58e+06 1.00e+08 6.09e+07 ... 
> > 
> 
> What I am trying to do is subset this thing down by server, instance, 
> instanceno, countername and then apply a function to each subsetted 
> dataframe.  The function performs a calculation on countervalue, 
> essentially "collapsing" instanceno and instance down to a single 
> value. 
> 
> Here is a snippet of my code: 
> t1 <- by(net1, 
>          list( 
>               net1$server, 
>               factor(as.character(net1$countername))),# get rid of 
> unused levels of countername for this server 
>          function(x){ 
>            g <- by(x, 
>                    list(factor(as.character(x$instance)), # get rid of 
> unused levels of instance for this server 
>                    factor(as.character(x$instanceno))),   # same with
instanceno 
> 
> function(y){c(NA,mean(y$perffreq)*diff(y$countervalue)/diff(y$perftime))})

>            data.frame(server=x$server, 
>                       ts=x$ts, 
>                       countername = x$countername, 
>                       countervalue = 
> apply(sapply(g[!sapply(g,is.null)],I),1,sum)) 
>          }) 
> 
> So t1 then is a list of dataframes, each with an identical set of columns)

> 
> > str(t1[[1]]) 
> `data.frame':   149 obs. of  4 variables: 
>  $ server      : Factor w/ 122 levels "AB93-99","AMP93-1",..: 1 1 1 1 
> 1 1 1 1 1 1 ... 
>  $ ts          :'POSIXct', format: chr  "2006-06-30 12:31:44" 
> "2006-06-30 12:32:58" "2006-06-30 12:34:46" "2006-06-30 12:36:55" ... 
>  $ countername : Factor w/ 4 levels "Bytes Received/sec",..: 1 1 1 1 1 
> 1 1 1 1 1 ... 
>  $ countervalue: num    NA  938  816 4213  906 ... 
> 
> What I'd dearly love to do, without looping or lapply-ing through t1 
> and rbinding (too much data for this to finish quickly enough -- this 
> is about 10% of what I'm eventually going to have to manage), is 
> convert t1 to one big dataframe. 
> 
> On the other hand, I admit that I may be going about this wrongly from 
> the start; perhaps there's a better approach? 
> 
> Any pointers would be most gratefully received. 
> 
> Many thanks! 
> 
> 
> -- 
> Regards, 
> 
> Mike Nielsen 
> 

-- 
Regards, 

Mike Nielsen 

______________________________________________ 
R-help at stat.math.ethz.ch mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>  
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>