[R] conditional selection of dataframe rows

David Winsemius dwinsemius at comcast.net
Fri Aug 13 00:15:46 CEST 2010


On Aug 12, 2010, at 5:20 PM, Toby Gass wrote:

> Hi,
>
> I do want to look only at slope.
> If there is one negative slope measurement  for a given day and a
> given chamber, I would like to remove all other slope measurements
> for that day and that chamber, even if they are positive.
>
> On one day, I will have 20 slope measurements for each chamber.  If
> one is negative, I would like to delete the other 19 for that chamber
> on that day, even if they are positive.  I have measurements for
> every day of the year, for 4 years and multiple chambers.
>
> I know I could make some awful nested loop with a vector of day and
> chamber numbers for each occurrence of a negative slope and then run
> that against the whole data set but I hope not to have to do that.
>
> Here is the rationale, if that helps.  These are unattended outdoor
> chambers that measure soil carbon efflux.  When the numbers go
> negative during part of the day but otherwise look normal, it usually
> means a plant has sprouted in the chamber and is using the carbon
> dioxide.  That means the measurements are all lower than they should
> be and I need to discard all measurements collected on that day,
> whether positive or negative.
>
> It might have been a little clearer if I'd make the toy dataframe a
> bit larger.

I think the fault was all mine. Failure to read for meaning. Here's an  
alternate strategy, although I think Schwartz's might be cleaner:

 > toy$ch.day.cat <- with(toy, paste(CH, DAY, sep="."))
 > negs.idxs <- tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x  
<0) )
 > negs.idxs
   3.4   3.5   4.4   4.5   5.4   5.5
FALSE FALSE FALSE FALSE FALSE  TRUE
 > toy[-which(negs.idxs), ]
   CH DAY SLOPE ch.day.cat
1  3   4   0.2        3.4
2  4   4   0.3        4.4
3  5   4   0.4        5.4
4  3   4   0.5        3.4
5  4   4   0.6        4.4
7  3   5   0.1        3.5
8  4   5   0.0        4.5
9  5   5  -0.1        5.5

-- 
David
>
> Thanks again for the assistance.
>
> Toby
>
>
>
> On 12 Aug 2010 at 16:39, David Winsemius wrote:
>
>>
>> On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:
>>
>>> Thank you all for the quick responses.  So far as I've checked,
>>> Marc's solution works perfectly and is quite speedy.  I'm still
>>> trying to figure out what it is doing. :)
>>>
>>> Henrique's solution seems to need some columns somewhere.  David's
>>> solution does not find all the other measurements, possibly with
>>> positive values, taken on the same day.
>>
>> I assumed you only wanted to look at what appeared to be a data
>> column, SLOPE. If you want to look at all columns for negatives then
>> try:
>>
>> toy[ which( apply(toy, 1, function(x) all(x >= 0)) ), ]  # or
>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>
>> This is how they differ w,r,t, their handling of NA's.
>>
>>> toy[3,2] <- NA
>>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>    CH DAY SLOPE
>> 1   3   4   0.2
>> 2   4   4   0.3
>> NA NA  NA    NA
>> 4   3   4   0.5
>> 5   4   4   0.6
>> 6   5   5   0.2
>> 7   3   5   0.1
>> 8   4   5   0.0
>>> toy[ which(apply(toy, 1, function(x) all(x >= 0)) ), ]
>>   CH DAY SLOPE
>> 1  3   4   0.2
>> 2  4   4   0.3
>> 4  3   4   0.5
>> 5  4   4   0.6
>> 6  5   5   0.2
>> 7  3   5   0.1
>> 8  4   5   0.0
>>
>>
>>>
>>> Thank you again for your efforts.
>>>
>>> Toby
>>>
>>> On 12 Aug 2010 at 14:32, Marc Schwartz wrote:
>>>
>>>> On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:
>>>>
>>>>> On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
>>>>>
>>>>>> Dear helpeRs,
>>>>>>
>>>>>> I have a dataframe (14947 x 27) containing measurements collected
>>>>>> every 5 seconds at several different sampling locations.  If one
>>>>>> measurement at a given location is less than zero on a given  
>>>>>> day, I
>>>>>> would like to delete all measurements from that location on that
>>>>>> day.
>>>>>>
>>>>>> Here is a toy example:
>>>>>>
>>>>>> toy <- data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
>>>>>> SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
>>>>>>
>>>>>> In this example, row 9 has a negative measurement for Chamber 5,
>>>>>> so I
>>>>>> would like to delete row 6, which is the same Chamber on the same
>>>>>> day, but not row 3, which is the same chamber on a different
>>>>>> day.  In
>>>>>> the full dataframe, there are, of course, many more days.
>>>>>>
>>>>>> Is there a handy R way to do this?
>>>>>>
>>>>>> Thank you for the assistance.
>>>>>>
>>>>>> Toby
>>>>>
>>>>>
>>>>>
>>>>> Not fully tested, but here is one possibility:
>>>>>
>>>>>> toy
>>>>> CH DAY SLOPE
>>>>> 1  3   4   0.2
>>>>> 2  4   4   0.3
>>>>> 3  5   4   0.4
>>>>> 4  3   4   0.5
>>>>> 5  4   4   0.6
>>>>> 6  5   5   0.2
>>>>> 7  3   5   0.1
>>>>> 8  4   5   0.0
>>>>> 9  5   5  -0.1
>>>>>
>>>>>
>>>>>> subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0))  
>>>>>> == 0)
>>>>> CH DAY SLOPE
>>>>> 1  3   4   0.2
>>>>> 2  4   4   0.3
>>>>> 3  5   4   0.4
>>>>> 4  3   4   0.5
>>>>> 5  4   4   0.6
>>>>> 7  3   5   0.1
>>>>> 8  4   5   0.0
>>>>
>>>>
>>>> This can actually be slightly shortened to:
>>>>
>>>>> subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0)))
>>>> CH DAY SLOPE
>>>> 1  3   4   0.2
>>>> 2  4   4   0.3
>>>> 3  5   4   0.4
>>>> 4  3   4   0.5
>>>> 5  4   4   0.6
>>>> 7  3   5   0.1
>>>> 8  4   5   0.0
>>>>
>>>>
>>>> HTH,
>>>>
>>>> Marc
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list