[R] Manage an unknown and variable number of data frames

Mark Knecht markknecht at gmail.com
Sun Sep 13 19:46:07 CEST 2009


Hi David,
   Thanks. This has really helped me clarify my needs and the results
are now far closer to what I need. I've dropped stuff I didn't really
need to keep out of the data array and clarified the subset equation.

   The one place where I still need some help is in the automatic
creation of the subset selection criteria. My goal is to only edit
CutList and then everything else falls out of that. In the following
code I've added "e" to the CutList and would need "e" added to the
subset logic. I.e. - The current criteria

as.data.frame(CutResults)$a == CutTable$a[x] &
as.data.frame(CutResults)$b == CutTable$b[x]

should be changed to:

as.data.frame(CutResults)$a == CutTable$a[x] &
as.data.frame(CutResults)$b == CutTable$b[x] &
as.data.frame(CutResults)$e == CutTable$e[x]

   I suspect this could possibly be done with another lapply (somehow)
using each of the elements of names(CutList) to replace the a, b & e
in the new equation?

   Current code follows.

   I really appreciate your help.

Cheers,
Mark





dat = data.frame(
	a=round(runif(100,-20,30),2),
	b=round(runif(100,-40,50),2),
	c=1:4,
	d=1:5,
	e=1:20
	)

CutList = list(
	a=c(-Inf,-10,10,Inf),
	b=c(-Inf,0,20,Inf),
	e=c(-Inf,13,Inf)
	)

CutResults = mapply(cut,x=dat[,names(CutList)],CutList,SIMPLIFY=FALSE)
CutTable = as.data.frame(table(CutResults))

work.list <- lapply(
			1:dim(CutTable)[1],

			function(x) subset(dat,
#						select=names(CutList),
						as.data.frame(CutResults)$a == CutTable$a[x] &
						as.data.frame(CutResults)$b == CutTable$b[x]

						)
			)

CutTable





On Sat, Sep 12, 2009 at 9:35 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Sep 12, 2009, at 10:13 PM, Mark Knecht wrote:
>
>> Hi,
>>  In the code below I create a small data.frame (dat) and then cut it
>> into different groups using CutList. The lists in CutList allow to me
>> choose whatever columns I want from dat and allow me to cut it into
>> any number of groups by changing the lists. It seems to work OK but
>> when I'm done I have a variable number of data frames what I need to
>> do further operations on and I don't know how to manage them as a
>> collection.
>
> List processing.
>
>>
>>  How do experience R coders handle keeping all this straight so that
>> if I add another column from dat and more groups in the cuts it all
>> stays straight? I need to send each dataf rame to another function to
>> add columns of specific data calcuations to each of them.
>>
>>  Best for me (I think) would be to enumerate each data frame using
>> the row.name number from CutTable if possible, but that's just my
>> thought. If each data frame became an element of CutTable then I'd
>> always know where they are. Really I'm needing to get a handle on
>> keeping a variable and unknown number of these things straight.
>>
>> Thanks,
>> Mark
>>
>> dat = data.frame(
>>        a=round(runif(100,-20,30),2),
>>        b=round(runif(100,-40,50),2)
>>        )
>>
>> # Give each cut list a name matching the column in dat that you
>> # want to use as criteria for making the cut.
>> # Create any number of cuts in each row.
>>
>> CutList = list(
>>        a=c(-Inf,-10,10,Inf),
>>        b=c(-Inf,0,20,Inf)
>>        )
>>
>> CutResults = mapply(cut,x=dat[,names(CutList)],CutList,SIMPLIFY=FALSE)
>> CutTable = as.data.frame(table(CutResults))
>>
>> CutResultsDF = as.data.frame(CutResults)
>> head(CutResultsDF, n=15)
>>
>> dat$aRange = CutResultsDF$a
>> dat$bRange = CutResultsDF$b
>> head(dat, 15)
>
> You could have gotten the same labeling of columns into categories with a
> combination of ave and cut.
>
>> dat$arng2 <- ave(dat$a, FUN=function(x) cut(x, breaks=CutList$a) )
>> dat
>         a      b     aRange    bRange arng2
> 1   -10.45  43.30 (-Inf,-10] (20, Inf]     1
> 2     9.09 -33.66   (-10,10]  (-Inf,0]     2
> 3    29.27  18.34  (10, Inf]    (0,20]     3
> 4    28.92  46.55  (10, Inf] (20, Inf]     3
> 5     2.07  -8.23   (-10,10]  (-Inf,0]     2
> 6    18.28 -35.13  (10, Inf]  (-Inf,0]     3
> 7   -16.26  40.59 (-Inf,-10] (20, Inf]     1
> snip
>
>
>>
>>
>> # I don't want to do the following as it doesn't
>> # get managed automatically.
>>
>
> It is possibly unclear what you are hoping to accomplish with that
> subset(subset(.)) construction. Are you trying to accomplish what a logical
> conjunction for subset= , coupled with a select= parameter would do inside a
> single subset?
>
>> subset(dat, aRange==CutTable$a[1] & bRange==CutTable$b[1],
>> select=c("a","b") )
>        a      b
> 26 -17.50 -18.46
> 28 -15.48 -34.37
> 31 -10.04 -21.55
> 38 -11.73 -29.40
> 46 -18.28 -17.42
> 95 -11.62 -22.94
> 96 -12.16  -1.57
> 97 -15.44 -19.89
>
>> Subset1 = subset(subset(dat, ,
>> Subset2 = subset(subset(dat, aRange==CutTable$a[2]),
>> bRange==CutTable$b[2])[1:2]
>> Subset3 = subset(subset(dat, aRange==CutTable$a[3]),
>> bRange==CutTable$b[3])[1:2]
>> Subset4 = subset(subset(dat, aRange==CutTable$a[4]),
>> bRange==CutTable$b[4])[1:2]
>
> You could "automate" that with
>> work.list <- lapply(1:4, function(x) subset(dat, aRange==CutTable$a[x] &
>> bRange==CutTable$b[x], select=c("a","b")  )  )
>> work.list[[1]]  # first element of a 4 element list
>        a      b
> 26 -17.50 -18.46
> 28 -15.48 -34.37
> 31 -10.04 -21.55
> 38 -11.73 -29.40
> 46 -18.28 -17.42
> 95 -11.62 -22.94
> 96 -12.16  -1.57
> 97 -15.44 -19.89
>
>
>> Subset1
>> Subset2
>> Subset3
>> Subset4
>>
>> CutTable
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>




More information about the R-help mailing list