[R] Using by() and stacking back sub-data frames to one data frame
David Winsemius
dwinsemius at comcast.net
Thu Jun 25 15:07:36 CEST 2009
Your request for a more general approach is precisely the reason that
Hadley Wickham wrote the plyr package. He describes a split-apply-
combine strategy for a variety of data structures and tools to
implement those strategies here:
http://had.co.nz/plyr/plyr-intro-090510.pdf
The argument to the "by" stp is a column name rather than a list or
object as it would be in tapply or split. I is just the identity
function which doubles for return(x) in your code.
library(plyr)
> ddply(y, "month", fun=I)
suid month esr
1 1074034 1 2
2 1074034 1 1
3 1074034 1 2
4 1074034 1 9
5 1123003 1 2
6 1074034 2 2
7 1074034 2 1
8 1074034 2 2
9 1074034 2 9
10 1123003 2 2
11 1074034 3 2
12 1074034 3 1
13 1074034 3 2
14 1074034 3 9
15 1123003 3 2
16 1074034 12 6
17 1074034 12 1
18 1074034 12 2
19 1074034 12 9
20 1123003 12 2
On Jun 24, 2009, at 11:34 PM, Stephan Lindner wrote:
> Dear all,
>
>
> I have a code where I subset a data frame to match entries within
> levels of an factor (actually, the full script uses three difference
> factors do do that). I'm very happy with the precision with which I
> can
> work with R, but since I loop over factor levels, and the data frame
> is
> big, the process is slow. So I've been trying to speed up the process
> using by(), but I got stuck at the point where I want to stack back
> the sub- data frames, and I was wondering whether someone could help
> me
> out.
>
> Here is an example:
>
> <--
>
>> y <- data.frame(suid = c(rep(1074034,16),rep(1123003,4)),
> month = rep(c(12,1,2,3),5),
> esr = c(6,2,2,2,1,1,1,1,2,2,2,2,9,9,9,9,2,2,2,2))
>
>
>> by(y,y$month,function(x)return(x))
>
> y$month: 1
> suid month esr
> 2 1074034 1 2
> 6 1074034 1 1
> 10 1074034 1 2
> 14 1074034 1 9
> 18 1123003 1 2
> ------------------------------------------------------------
> y$month: 2
> suid month esr
> 3 1074034 2 2
> 7 1074034 2 1
> 11 1074034 2 2
> 15 1074034 2 9
> 19 1123003 2 2
> ------------------------------------------------------------
> y$month: 3
> suid month esr
> 4 1074034 3 2
> 8 1074034 3 1
> 12 1074034 3 2
> 16 1074034 3 9
> 20 1123003 3 2
> ------------------------------------------------------------
> y$month: 12
> suid month esr
> 1 1074034 12 6
> 5 1074034 12 1
> 9 1074034 12 2
> 13 1074034 12 9
> 17 1123003 12 2
>
> -->
>
> What I would like to do is stacking these four data frames back to one
> data frame, which in this simple example would just be y. I tried
> unlist(), unclass() and rbind(), but none of them would work.
>
>
> Thanks a lot,
>
>
>
> Stephan
>
>
>
>
>
>
>
>
>
>
> --
> -----------------------
> Stephan Lindner
> University of Michigan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list