[R] Manage an unknown and variable number of data frames
David Winsemius
dwinsemius at comcast.net
Sun Sep 13 06:35:14 CEST 2009
On Sep 12, 2009, at 10:13 PM, Mark Knecht wrote:
> Hi,
> In the code below I create a small data.frame (dat) and then cut it
> into different groups using CutList. The lists in CutList allow to me
> choose whatever columns I want from dat and allow me to cut it into
> any number of groups by changing the lists. It seems to work OK but
> when I'm done I have a variable number of data frames what I need to
> do further operations on and I don't know how to manage them as a
> collection.
List processing.
>
> How do experience R coders handle keeping all this straight so that
> if I add another column from dat and more groups in the cuts it all
> stays straight? I need to send each dataf rame to another function to
> add columns of specific data calcuations to each of them.
>
> Best for me (I think) would be to enumerate each data frame using
> the row.name number from CutTable if possible, but that's just my
> thought. If each data frame became an element of CutTable then I'd
> always know where they are. Really I'm needing to get a handle on
> keeping a variable and unknown number of these things straight.
>
> Thanks,
> Mark
>
> dat = data.frame(
> a=round(runif(100,-20,30),2),
> b=round(runif(100,-40,50),2)
> )
>
> # Give each cut list a name matching the column in dat that you
> # want to use as criteria for making the cut.
> # Create any number of cuts in each row.
>
> CutList = list(
> a=c(-Inf,-10,10,Inf),
> b=c(-Inf,0,20,Inf)
> )
>
> CutResults = mapply(cut,x=dat[,names(CutList)],CutList,SIMPLIFY=FALSE)
> CutTable = as.data.frame(table(CutResults))
>
> CutResultsDF = as.data.frame(CutResults)
> head(CutResultsDF, n=15)
>
> dat$aRange = CutResultsDF$a
> dat$bRange = CutResultsDF$b
> head(dat, 15)
You could have gotten the same labeling of columns into categories
with a combination of ave and cut.
> dat$arng2 <- ave(dat$a, FUN=function(x) cut(x, breaks=CutList$a) )
> dat
a b aRange bRange arng2
1 -10.45 43.30 (-Inf,-10] (20, Inf] 1
2 9.09 -33.66 (-10,10] (-Inf,0] 2
3 29.27 18.34 (10, Inf] (0,20] 3
4 28.92 46.55 (10, Inf] (20, Inf] 3
5 2.07 -8.23 (-10,10] (-Inf,0] 2
6 18.28 -35.13 (10, Inf] (-Inf,0] 3
7 -16.26 40.59 (-Inf,-10] (20, Inf] 1
snip
>
>
> # I don't want to do the following as it doesn't
> # get managed automatically.
>
It is possibly unclear what you are hoping to accomplish with that
subset(subset(.)) construction. Are you trying to accomplish what a
logical conjunction for subset= , coupled with a select= parameter
would do inside a single subset?
> subset(dat, aRange==CutTable$a[1] & bRange==CutTable$b[1],
select=c("a","b") )
a b
26 -17.50 -18.46
28 -15.48 -34.37
31 -10.04 -21.55
38 -11.73 -29.40
46 -18.28 -17.42
95 -11.62 -22.94
96 -12.16 -1.57
97 -15.44 -19.89
> Subset1 = subset(subset(dat, ,
> Subset2 = subset(subset(dat, aRange==CutTable$a[2]), bRange==CutTable
> $b[2])[1:2]
> Subset3 = subset(subset(dat, aRange==CutTable$a[3]), bRange==CutTable
> $b[3])[1:2]
> Subset4 = subset(subset(dat, aRange==CutTable$a[4]), bRange==CutTable
> $b[4])[1:2]
You could "automate" that with
> work.list <- lapply(1:4, function(x) subset(dat, aRange==CutTable
$a[x] & bRange==CutTable$b[x], select=c("a","b") ) )
> work.list[[1]] # first element of a 4 element list
a b
26 -17.50 -18.46
28 -15.48 -34.37
31 -10.04 -21.55
38 -11.73 -29.40
46 -18.28 -17.42
95 -11.62 -22.94
96 -12.16 -1.57
97 -15.44 -19.89
> Subset1
> Subset2
> Subset3
> Subset4
>
> CutTable
>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list