[R] How to extract same columns from identical dataframes in a list?
peter dalgaard
pdalgd at gmail.com
Tue Feb 9 16:19:00 CET 2016
Like this?
> l <- replicate(3,data.frame(w1=sample(1:4),w2=sample(1:4)), simplify=FALSE)
> l
[[1]]
w1 w2
1 2 2
2 3 3
3 1 1
4 4 4
[[2]]
w1 w2
1 3 4
2 2 2
3 1 3
4 4 1
[[3]]
w1 w2
1 1 4
2 4 3
3 2 1
4 3 2
> sapply(l,"[[",2)
[,1] [,2] [,3]
[1,] 2 4 4
[2,] 3 2 3
[3,] 1 3 1
[4,] 4 1 2
Or even
> sapply(l,"[",,2)
[,1] [,2] [,3]
[1,] 2 4 4
[2,] 3 2 3
[3,] 1 3 1
[4,] 4 1 2
Notice that if dd[1:24] gives you the 1st column, then dd is not a data frame but rather a matrix, and indexing semantics are different. In that case, for some unspeakable reason, the empty index does not work and you'll need something like
> l <- replicate(3,cbind(w1=sample(1:4),w2=sample(1:4)), simplify=FALSE)
> sapply(l,"[",T,2)
[,1] [,2] [,3]
[1,] 4 3 2
[2,] 1 1 4
[3,] 3 2 3
[4,] 2 4 1
Or, brute-force-and-ignorance:
> sapply(l, function(e) e[, 2])
[,1] [,2] [,3]
[1,] 4 3 2
[2,] 1 1 4
[3,] 3 2 3
[4,] 2 4 1
On 09 Feb 2016, at 10:03 , Wolfgang Waser <waser at frankenfoerder-fg.de> wrote:
> Hi,
>
> sorry if my description was too short / unclear.
>
>> I have a list of 7 data frames, each data frame having 24 rows (hour of
>> the day) and 5 columns (weeks) with a total of 5 x 24 values
>
> [1]
> week1 week2 week3 ...
> 1 x a m ...
> 2 y b n
> 3 z c o
> . . . .
> . . . .
> . . . .
> 24 . . .
>
>
> [2]
> week1 week2 week3 ...
> 1 x2 a2 m2 ...
> 2 y2 b2 n2
> 3 z2 c2 o2
> . . . .
> . . . .
> . . . .
> 24 . . .
>
>
> [3]
> ...
>
> .
> .
> .
>
>
> [7]
> ...
>
>
>
> I now would like to extract e.g. all week2 columns of all data frames in
> the list and combine them in a new data frame using cbind.
>
> new data frame
>
> week2 ([1]) week2 ([2]) week2 ([3]) ...
> a a2 .
> b b2 .
> c c2 .
> .
> .
> .
>
> I will then do further row-wise calculations using e.g. apply(x,1,mean),
> the result being a vector of 24 values.
>
>
> I have not found a way to extract specific columns of the data frames in
> a list.
>
>
> As mentioned I can use
>
> sapply(list_of_dataframes,"[",1:24)
>
> which will pick the first 24 values (first column) of each data frame in
> the list and arrange them as an array of 24 rows and 7 columns (7 data
> frames are in the list).
> To pick the second column (week2) using sapply I have to use the next 24
> values from 25 to 48:
>
> sapply(list_of_dataframes,"[",25:48)
>
>
> It seems that sapply treats the data frames in the list as vectors. I
> can of course extract all consecutive weeks using consecutive blocks of
> 24 values, but this seems cumbersome.
>
>
> The question remains, how to select specific columns from data frames in
> a list, e.g. all columns 3 of all data frames in the list.
>
>
> Reformatting (unlist(), dim()) in one data frame with one column for
> each week does not help, since I'm not calculating colMeans etc, but
> row-wise calculations using apply(x,1,FUN) ("applying a function to
> margins of an array or matrix").
>
>
>
> Thanks for you help and suggestions!
>
>
> Wolfgang
>
>
>
> On 08/02/16 18:00, Dénes Tóth wrote:
>> Hi,
>>
>> Although you did not provide any reproducible example, it seems you
>> store the same type of values in your data.frames. If this is true, it
>> is much more efficient to store your data in an array:
>>
>> mylist <- list(a = data.frame(week1 = rnorm(24), week2 = rnorm(24)),
>> b = data.frame(week1 = rnorm(24), week2 = rnorm(24)))
>>
>> myarray <- unlist(mylist, use.names = FALSE)
>> dim(myarray) <- c(nrow(mylist$a), ncol(mylist$a), length(mylist))
>> dimnames(myarray) <- list(hour = rownames(mylist$a),
>> week = colnames(mylist$a),
>> other = names(mylist))
>> # now you can do:
>> mean(myarray[, "week1", "a"])
>>
>> # or:
>> colMeans(myarray)
>>
>>
>> Cheers,
>> Denes
>>
>>
>> On 02/08/2016 02:33 PM, Wolfgang Waser wrote:
>>> Hello,
>>>
>>> I have a list of 7 data frames, each data frame having 24 rows (hour of
>>> the day) and 5 columns (weeks) with a total of 5 x 24 values
>>>
>>> I would like to combine all 7 columns of week 1 (and 2 ...) in a
>>> separate data frame for hourly calculations, e.g.
>>>> apply(new.data.frame,1,mean)
>>>
>>> In some way sapply (lapply) works, but I cannot directly select columns
>>> of the original data frames in the list. As a workaround I have to
>>> select a range of values:
>>>
>>>> sapply(list_of_dataframes,"[",1:24)
>>>
>>> Values 1:24 give the first column, 25:48 the second and so on.
>>>
>>> Is there an easier / more direct way to select for specific columns
>>> instead of selecting a range of values, avoiding loops?
>>>
>>>
>>> Cheers,
>>>
>>> Wolfgang
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> --
> Frankenförder Forschungsgesellschaft mbH
> Dr. Wolfgang Waser
> Wissenschaftsbereich Berlin
> Chausseestraße 10
> 10115 Berlin
> Tel.: +49(0)30 2809 1936
> Fax.: +49(0)30 2809 1940
> E-Mail: waser at frankenfoerder-fg.de
>
> Frankenförder Forschungsgesellschaft mbH (FFG)
> Sitz: Luckenwalde,Amtsgericht Potsdam, HRB: 6499
> Geschäftsführerin: Dipl. Agraring. Doreen Sparborth
> Tel.: +49(0)30 2809 1931, E-Mail: info at frankenfoerder-fg.de
> http://www.frankenfoerder-fg.de
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list