[R] conditional selection of dataframe rows

David Winsemius dwinsemius at comcast.net
Fri Aug 13 00:39:59 CEST 2010


On Aug 12, 2010, at 6:15 PM, David Winsemius wrote:

>
> On Aug 12, 2010, at 5:20 PM, Toby Gass wrote:
>
>> Hi,
>>
>> I do want to look only at slope.
>> If there is one negative slope measurement  for a given day and a
>> given chamber, I would like to remove all other slope measurements
>> for that day and that chamber, even if they are positive.
>>
>> On one day, I will have 20 slope measurements for each chamber.  If
>> one is negative, I would like to delete the other 19 for that chamber
>> on that day, even if they are positive.  I have measurements for
>> every day of the year, for 4 years and multiple chambers.
>>
>> I know I could make some awful nested loop with a vector of day and
>> chamber numbers for each occurrence of a negative slope and then run
>> that against the whole data set but I hope not to have to do that.
>>
>> Here is the rationale, if that helps.  These are unattended outdoor
>> chambers that measure soil carbon efflux.  When the numbers go
>> negative during part of the day but otherwise look normal, it usually
>> means a plant has sprouted in the chamber and is using the carbon
>> dioxide.  That means the measurements are all lower than they should
>> be and I need to discard all measurements collected on that day,
>> whether positive or negative.
>>
>> It might have been a little clearer if I'd make the toy dataframe a
>> bit larger.
>
> I think the fault was all mine. Failure to read for meaning. Here's  
> an alternate strategy, although I think Schwartz's might be cleaner:
>
> > toy$ch.day.cat <- with(toy, paste(CH, DAY, sep="."))
> > negs.idxs <- tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x  
> <0) )
> > negs.idxs
>  3.4   3.5   4.4   4.5   5.4   5.5
> FALSE FALSE FALSE FALSE FALSE  TRUE
> > toy[-which(negs.idxs), ]
>  CH DAY SLOPE ch.day.cat
> 1  3   4   0.2        3.4
> 2  4   4   0.3        4.4
> 3  5   4   0.4        5.4
> 4  3   4   0.5        3.4
> 5  4   4   0.6        4.4
> 7  3   5   0.1        3.5
> 8  4   5   0.0        4.5
> 9  5   5  -0.1        5.5
>

I think I should give up today. I saw that the above code eliminates  
#6 and only after posting saw that #9 was left in:

require(rms)   # for %nin%  .. or use the %w/o% operator defined on  
match help page:

 > toy[toy$ch.day.cat %nin% names(negs.idxs[negs.idxs]), ]
   CH DAY SLOPE ch.day.cat
1  3   4   0.2        3.4
2  4   4   0.3        4.4
3  5   4   0.4        5.4
4  3   4   0.5        3.4
5  4   4   0.6        4.4
7  3   5   0.1        3.5
8  4   5   0.0        4.5

Now I am really sure that the ave(  , , any)  strategy is superior.


> -- 
> David
>>
>> Thanks again for the assistance.
>>
>> Toby
>>
>>
>>
>> On 12 Aug 2010 at 16:39, David Winsemius wrote:
>>
>>>
>>> On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:
>>>
>>>> Thank you all for the quick responses.  So far as I've checked,
>>>> Marc's solution works perfectly and is quite speedy.  I'm still
>>>> trying to figure out what it is doing. :)
>>>>
>>>> Henrique's solution seems to need some columns somewhere.  David's
>>>> solution does not find all the other measurements, possibly with
>>>> positive values, taken on the same day.
>>>
>>> I assumed you only wanted to look at what appeared to be a data
>>> column, SLOPE. If you want to look at all columns for negatives then
>>> try:
>>>
>>> toy[ which( apply(toy, 1, function(x) all(x >= 0)) ), ]  # or
>>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>>
>>> This is how they differ w,r,t, their handling of NA's.
>>>
>>>> toy[3,2] <- NA
>>>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>>   CH DAY SLOPE
>>> 1   3   4   0.2
>>> 2   4   4   0.3
>>> NA NA  NA    NA
>>> 4   3   4   0.5
>>> 5   4   4   0.6
>>> 6   5   5   0.2
>>> 7   3   5   0.1
>>> 8   4   5   0.0
>>>> toy[ which(apply(toy, 1, function(x) all(x >= 0)) ), ]
>>>  CH DAY SLOPE
>>> 1  3   4   0.2
>>> 2  4   4   0.3
>>> 4  3   4   0.5
>>> 5  4   4   0.6
>>> 6  5   5   0.2
>>> 7  3   5   0.1
>>> 8  4   5   0.0
>>>
>>>
>>>>
>>>> Thank you again for your efforts.
>>>>
>>>> Toby
>>>>
>>>> On 12 Aug 2010 at 14:32, Marc Schwartz wrote:
>>>>
>>>>> On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:
>>>>>
>>>>>> On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
>>>>>>
>>>>>>> Dear helpeRs,
>>>>>>>
>>>>>>> I have a dataframe (14947 x 27) containing measurements  
>>>>>>> collected
>>>>>>> every 5 seconds at several different sampling locations.  If one
>>>>>>> measurement at a given location is less than zero on a given  
>>>>>>> day, I
>>>>>>> would like to delete all measurements from that location on that
>>>>>>> day.
>>>>>>>
>>>>>>> Here is a toy example:
>>>>>>>
>>>>>>> toy <- data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
>>>>>>> SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
>>>>>>>
>>>>>>> In this example, row 9 has a negative measurement for Chamber 5,
>>>>>>> so I
>>>>>>> would like to delete row 6, which is the same Chamber on the  
>>>>>>> same
>>>>>>> day, but not row 3, which is the same chamber on a different
>>>>>>> day.  In
>>>>>>> the full dataframe, there are, of course, many more days.
>>>>>>>
>>>>>>> Is there a handy R way to do this?
>>>>>>>
>>>>>>> Thank you for the assistance.
>>>>>>>
>>>>>>> Toby
>>>>>>
>>>>>>
>>>>>>
>>>>>> Not fully tested, but here is one possibility:
>>>>>>
>>>>>>> toy
>>>>>> CH DAY SLOPE
>>>>>> 1  3   4   0.2
>>>>>> 2  4   4   0.3
>>>>>> 3  5   4   0.4
>>>>>> 4  3   4   0.5
>>>>>> 5  4   4   0.6
>>>>>> 6  5   5   0.2
>>>>>> 7  3   5   0.1
>>>>>> 8  4   5   0.0
>>>>>> 9  5   5  -0.1
>>>>>>
>>>>>>
>>>>>>> subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0))  
>>>>>>> == 0)
>>>>>> CH DAY SLOPE
>>>>>> 1  3   4   0.2
>>>>>> 2  4   4   0.3
>>>>>> 3  5   4   0.4
>>>>>> 4  3   4   0.5
>>>>>> 5  4   4   0.6
>>>>>> 7  3   5   0.1
>>>>>> 8  4   5   0.0
>>>>>
>>>>>
>>>>> This can actually be slightly shortened to:
>>>>>
>>>>>> subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0)))
>>>>> CH DAY SLOPE
>>>>> 1  3   4   0.2
>>>>> 2  4   4   0.3
>>>>> 3  5   4   0.4
>>>>> 4  3   4   0.5
>>>>> 5  4   4   0.6
>>>>> 7  3   5   0.1
>>>>> 8  4   5   0.0
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>> Marc
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list