[R] Fast nested List->data.frame

Greg Hirson ghirson at ucdavis.edu
Tue Jan 5 09:40:37 CET 2010


Dieter,

I'd approach this by first making a matrix, then converting to a data 
frame with appropriate types. I'm sure there is a way to do it with 
structure in one step. Operations on matrices are usually faster than on 
dataframes.


len <- 100000
d <- replicate(len, list(pH = 3, marker = TRUE, position = "A"), FALSE)

toDF <- function(alist){
d.matrix <- matrix(unlist(alist), ncol = 3, byrow = TRUE)
d.df <- as.data.frame(d.matrix)
names(d.df) <- c('pH', 'marker', 'position')

d.df$pH <- as.numeric(d.df$pH)
d.df$marker <- as.logical(d.df$marker)
return(d.df)
}

on my system,
system.time(b<-toDF(d))

    user  system elapsed
   0.560   0.033   0.592

and

head(b)

   pH marker position
1  1   TRUE        A
2  1   TRUE        A
3  1   TRUE        A
4  1   TRUE        A
5  1   TRUE        A
6  1   TRUE        A

and

sapply(b, class)

        pH    marker  position
"numeric" "logical"  "factor"


I hope this helps,

Greg

sessionInfo()   ##old, I know.
R version 2.9.0 (2009-04-17)
i386-apple-darwin8.11.1

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] cimis_0.1-3     RLastFM_0.1-4   RCurl_0.98-1    bitops_1.0-4.1  
XML_2.5-3
[6] lattice_0.17-22

loaded via a namespace (and not attached):
[1] grid_2.9.0



On 1/4/10 11:43 PM, Dieter Menne wrote:
> I have very large data sets given in a format similar to d below. Converting
> these to a data frame is a bottleneck in my application. My fastest version
> is given below, but it look clumsy to me.
>
> Any ideas?
>
> Dieter
>
> # -----------------------
> len = 100000
> d = replicate(len, list(pH = 3,marker = TRUE,position = "A"),FALSE)
> # Data are given as d
>
> # preallocate vectors
> pH =rep(0,len)
> marker =rep(0,len)
> position =rep(0,len)
>
> system.time(
> {
>              for (i in 1:len)
>              {
>                d1 = d[[i]]
>                #Assign to vectors
>                pH[i] = d1[[1]]
>                marker[i] = d1[[2]]
>                position[i] = d1[[3]]
>              }
>          # combine vectors
>          pHAll = data.frame(pH,marker,position)
> }
> )
>
>
>    

-- 
Greg Hirson
ghirson at ucdavis.edu

Graduate Student
Agricultural and Environmental Chemistry

1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616



More information about the R-help mailing list