[R] Extract values from multiple lists
Dénes Tóth
toth.denes at ttk.mta.hu
Wed Dec 17 11:46:17 CET 2014
Dear Jeff,
On 12/17/2014 01:46 AM, Jeff Newmiller wrote:
> You are chasing ghosts of performance past, Denes.
In terms of memory efficiency, yes. In terms of CPU time, there can be
significant difference, see below.
The data.frame
> function causes no problems, and if it is used then the OP would not
> need to presume they know the internal structure of the data frame.
> See below. (I am using R3.1.2.)
>
> a1 <- list(x = rnorm(1e6), y = rnorm(1e6))
> a2 <- list(x = rnorm(1e6), y = rnorm(1e6))
> a3 <- list(x = rnorm(1e6), y = rnorm(1e6))
>
> # get names of the objects
> out_names <- ls(pattern="a[[:digit:]]$")
>
> # amount of memory allocated
> gc(reset=TRUE)
>
> # Explicitly call data frame
> out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] )
>
> # No copying.
> gc()
>
> # Your suggested retreival method
> out3a <- lapply( lapply( out_names, get ), "[[", "x" )
> names( out3a ) <- out_names
> # The "obvious" way to finish the job works fine.
> out3 <- do.call( data.frame, out3a )
BTW, the even more "obvious" as.data.frame() produces the same with an
even more intuitive interface.
However, for lists with a larger number of elements the transformation
to a data.frame can be pretty slow. In the toy example, we created only
a three-element list. Let's increase it a little bit.
---
# this is not even that large
datlen <- 1e2
listlen <- 1e5
# create a toy list
mylist <- matrix(seq_len(datlen * listlen),
nrow = datlen, ncol = listlen)
mylist <- lapply(1:ncol(mylist), function(i) mylist[, i])
names(mylist) <- paste0("V", seq_len(listlen))
# define the more efficient function ---
# note that I put class(x) first so that setattr does not
# modify the attributes of the original input (see ?setattr,
# you have to be careful)
setAttrib <- function(x) {
class(x) <- "data.frame"
data.table::setattr(x, "row.names", seq_along(x[[1]]))
x
}
# benchmarking
# (we do not need microbenchmark here, the differences are
# extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec
gc(reset=TRUE)
system.time(df1 <- do.call(data.frame, mylist))
gc()
system.time(df2 <- as.data.frame(mylist))
gc()
system.time(df3 <- setAttrib(mylist))
gc()
# check results
identical(df1, df2)
identical(df1, df3)
----
Of course for small datasets, one should use the built-in and safe
functions (either do.call or as.data.frame). BTW, for the original
three-element list, these are even faster than the workaround.
All the best,
Denes
>
> # No copying... well, you do end up with a new list in out3, but the
> data itself doesn't get copied.
> gc()
>
>
> On Tue, 16 Dec 2014, D?nes T?th wrote:
>
>> On 12/16/2014 06:06 PM, SH wrote:
>>> Dear List,
>>>
>>> I hope this posting is not redundant. I have several list outputs
>>> with the
>>> same components. I ran a function with three different scenarios below
>>> (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the
>>> same components and group them as a data frame. For example,
>>> pop.inf.r1 <- scen1[['pop.inf.r']]
>>> pop.inf.r2 <- scen2[['pop.inf.r']]
>>> pop.inf.r3 <- scen3[['pop.inf.r']]
>>> ...
>>> pop.inf.rN<-scenN[['pop.inf.r']]
>>> new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN)
>>>
>>> My final output would be 'new.df'. Could you help me how I can do that
>>> efficiently?
>>
>> If efficiency is of concern, do not use data.frame() but create a list
>> and add the required attributes with data.table::setattr (the setattr
>> function of the data.table package). (You can also consider creating a
>> data.table instead of a data.frame.)
>>
>> # some largish lists
>> a1 <- list(x = rnorm(1e6), y = rnorm(1e6))
>> a2 <- list(x = rnorm(1e6), y = rnorm(1e6))
>> a3 <- list(x = rnorm(1e6), y = rnorm(1e6))
>>
>> # amount of memory allocated
>> gc(reset=TRUE)
>>
>> # get names of the objects
>> out_names <- ls(pattern="a[[:digit:]]$")
>>
>> # create a list
>> out <- lapply(lapply(out_names, get), "[[", "x")
>>
>> # note that no copying occured
>> gc()
>>
>> # decorate the list
>> data.table::setattr(out, "names", out_names)
>> data.table::setattr(out, "row.names", seq_along(out[[1]]))
>> class(out) <- "data.frame"
>>
>> # still no copy
>> gc()
>>
>> # output
>> head(out)
>>
>>
>> HTH,
>> Denes
>>
>>
>>>
>>> Thanks in advance,
>>>
>>> Steve
>>>
>>> P.S.: Below are some examples of summary outputs.
>>>
>>>
>>>> summary(scen1)
>>> Length Class Mode
>>> aql 1 -none- numeric
>>> rql 1 -none- numeric
>>> alpha 1 -none- numeric
>>> beta 1 -none- numeric
>>> n.sim 1 -none- numeric
>>> N 1 -none- numeric
>>> n.sample 1 -none- numeric
>>> n.acc 1 -none- numeric
>>> lot.inf.r 1 -none- numeric
>>> pop.inf.n 2000 -none- list
>>> pop.inf.r 2000 -none- list
>>> pop.decision.t1 2000 -none- list
>>> pop.decision.t2 2000 -none- list
>>> sp.inf.n 2000 -none- list
>>> sp.inf.r 2000 -none- list
>>> sp.decision 2000 -none- list
>>>> summary(scen2)
>>> Length Class Mode
>>> aql 1 -none- numeric
>>> rql 1 -none- numeric
>>> alpha 1 -none- numeric
>>> beta 1 -none- numeric
>>> n.sim 1 -none- numeric
>>> N 1 -none- numeric
>>> n.sample 1 -none- numeric
>>> n.acc 1 -none- numeric
>>> lot.inf.r 1 -none- numeric
>>> pop.inf.n 2000 -none- list
>>> pop.inf.r 2000 -none- list
>>> pop.decision.t1 2000 -none- list
>>> pop.decision.t2 2000 -none- list
>>> sp.inf.n 2000 -none- list
>>> sp.inf.r 2000 -none- list
>>> sp.decision 2000 -none- list
>>>> summary(scen3)
>>> Length Class Mode
>>> aql 1 -none- numeric
>>> rql 1 -none- numeric
>>> alpha 1 -none- numeric
>>> beta 1 -none- numeric
>>> n.sim 1 -none- numeric
>>> N 1 -none- numeric
>>> n.sample 1 -none- numeric
>>> n.acc 1 -none- numeric
>>> lot.inf.r 1 -none- numeric
>>> pop.inf.n 2000 -none- list
>>> pop.inf.r 2000 -none- list
>>> pop.decision.t1 2000 -none- list
>>> pop.decision.t2 2000 -none- list
>>> sp.inf.n 2000 -none- list
>>> sp.inf.r 2000 -none- list
>>> sp.decision 2000 -none- list
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
More information about the R-help
mailing list