[R] function to filter identical data.fames using less than (<) and greater than (>)

Karl Brand k.brand at erasmusmc.nl
Thu Dec 6 18:41:28 CET 2012


My problem is that using "[" every time i want to extract my data of 
interest is cumbersome and verbose for the next guy suffering through 
reading my code. Since my extractions are always on the same columns and 
depend on either "<", ">" or neither, a wrapper function or perhaps 
different function besides "[" will likely solve my problem. Indeed 
Rui's example achieves exactly what i wanted. Keep in mind my grasp of R 
remains limited and you might think my problem is more complex than it 
is. So i'm only inviting further solution's to this problem for the sake 
of improving my grasp of R. You certainly have my understanding should 
this go beyond what you might invest your time in :)

No less, your example code:

eg$grpcol[with(eg,grpcol!="Default" & A<1 & B<1)] <- "ABTooLow")

already provides educational material for me, thank you.

Chrs, K

On 06/12/12 18:00, Jeff Newmiller wrote:
> You ask me to provide code when you have only described your solution rather than your problem. That limits my options more than I care to allow for investing my time.
>
> When I think of problems that require repetitive subsetting I tend to look for solutions involving aggregation (?aggregate, ?plyr::ddply), which requires creating one or more grouping columns which can be formulated with the cut function or with logical indexed assignment (e.g. a sequence of statements something like eg$grpcol[with(eg,grpcol!="Default" & A<1 & B<1)] <- "ABTooLow").
>
> So... what is your problem?
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                        Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Karl Brand <k.brand at erasmusmc.nl> wrote:
>
>> Hi Jeff,
>>
>> Subset is indeed what's reuqired here. But using it every time it's
>> needed was generating excessive amounts of obtuse code. So for the sake
>>
>> of clarity and convenience i wanted a wrapper function to replace these
>>
>> repetitious subsets.
>>
>> Although Rui's example works just fine, love to see any idiomatic ways
>> you might attempt this (also for the sake of improving my grasp of R).
>>
>> Cheers,
>>
>> Karl
>>
>>
>>
>>
>> On 06/12/12 15:57, Jeff Newmiller wrote:
>>> You have not indicated why the subset function is insufficient for
>> your needs...
>>>
>> ---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>> Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> Go...
>>>                                         Live:   OO#.. Dead: OO#..
>> Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>>>
>> ---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> Karl Brand <k.brand at erasmusmc.nl> wrote:
>>>
>>>> Esteemed UseRs,
>>>>
>>>> I've got many biggish data frames which need a lot subsetting, like
>> in
>>>> this example:
>>>>
>>>> # example
>>>> eg <- data.frame(A = rnorm(10), B = rnorm(10), C = rnorm(10), D =
>>>> rnorm(10))
>>>> egsub <- eg[eg$A < 0 & eg$B < 1 & eg$C > 0, ]
>>>> egsub
>>>> egsub2 <- eg[eg$A > 1 & eg$B > 0, ]
>>>> egsub2
>>>>
>>>> # To make this clearer than 1000s of lines of extractions with []
>>>> # I tried to make a function like this:
>>>>
>>>> # func(data="eg", A="< 0", B="< 1", C="> 0")
>>>>
>>>> # Which would also need to be run as
>>>>
>>>> # func(data="eg", A="> 1", B="> 0", C=NA)
>>>> #end
>>>>
>>>> Noteably:
>>>> -the signs* "<" and ">" need to be flexible _and_ optional
>>>> -the quantities also need to be flexible
>>>> -column header names i.e, A, B and C don't need flexibility,
>>>> i.e., can remain fixed
>>>> * "less than" and "greater than" so google picks up this thread
>>>>
>>>> Once again i find just how limited my grasp of R is...Is do.call()
>> the
>>>> best way to call binary operators like < & > in a function? Is an
>>>> ifelse
>>>> statement needed for each column to make filtering on it optional?
>>>> etc....
>>>>
>>>> Any one with the patience to show their working version of such a
>>>> funciton would receive my undying Rdulation. With thanks in advance,
>>>>
>>>> Karl
>>>
>

-- 
Karl Brand
Dept of Cardiology and Dept of Bioinformatics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161




More information about the R-help mailing list