[R] conditional selection of dataframe rows
David Winsemius
dwinsemius at comcast.net
Fri Aug 13 00:15:46 CEST 2010
On Aug 12, 2010, at 5:20 PM, Toby Gass wrote:
> Hi,
>
> I do want to look only at slope.
> If there is one negative slope measurement for a given day and a
> given chamber, I would like to remove all other slope measurements
> for that day and that chamber, even if they are positive.
>
> On one day, I will have 20 slope measurements for each chamber. If
> one is negative, I would like to delete the other 19 for that chamber
> on that day, even if they are positive. I have measurements for
> every day of the year, for 4 years and multiple chambers.
>
> I know I could make some awful nested loop with a vector of day and
> chamber numbers for each occurrence of a negative slope and then run
> that against the whole data set but I hope not to have to do that.
>
> Here is the rationale, if that helps. These are unattended outdoor
> chambers that measure soil carbon efflux. When the numbers go
> negative during part of the day but otherwise look normal, it usually
> means a plant has sprouted in the chamber and is using the carbon
> dioxide. That means the measurements are all lower than they should
> be and I need to discard all measurements collected on that day,
> whether positive or negative.
>
> It might have been a little clearer if I'd make the toy dataframe a
> bit larger.
I think the fault was all mine. Failure to read for meaning. Here's an
alternate strategy, although I think Schwartz's might be cleaner:
> toy$ch.day.cat <- with(toy, paste(CH, DAY, sep="."))
> negs.idxs <- tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x
<0) )
> negs.idxs
3.4 3.5 4.4 4.5 5.4 5.5
FALSE FALSE FALSE FALSE FALSE TRUE
> toy[-which(negs.idxs), ]
CH DAY SLOPE ch.day.cat
1 3 4 0.2 3.4
2 4 4 0.3 4.4
3 5 4 0.4 5.4
4 3 4 0.5 3.4
5 4 4 0.6 4.4
7 3 5 0.1 3.5
8 4 5 0.0 4.5
9 5 5 -0.1 5.5
--
David
>
> Thanks again for the assistance.
>
> Toby
>
>
>
> On 12 Aug 2010 at 16:39, David Winsemius wrote:
>
>>
>> On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:
>>
>>> Thank you all for the quick responses. So far as I've checked,
>>> Marc's solution works perfectly and is quite speedy. I'm still
>>> trying to figure out what it is doing. :)
>>>
>>> Henrique's solution seems to need some columns somewhere. David's
>>> solution does not find all the other measurements, possibly with
>>> positive values, taken on the same day.
>>
>> I assumed you only wanted to look at what appeared to be a data
>> column, SLOPE. If you want to look at all columns for negatives then
>> try:
>>
>> toy[ which( apply(toy, 1, function(x) all(x >= 0)) ), ] # or
>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>
>> This is how they differ w,r,t, their handling of NA's.
>>
>>> toy[3,2] <- NA
>>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>> CH DAY SLOPE
>> 1 3 4 0.2
>> 2 4 4 0.3
>> NA NA NA NA
>> 4 3 4 0.5
>> 5 4 4 0.6
>> 6 5 5 0.2
>> 7 3 5 0.1
>> 8 4 5 0.0
>>> toy[ which(apply(toy, 1, function(x) all(x >= 0)) ), ]
>> CH DAY SLOPE
>> 1 3 4 0.2
>> 2 4 4 0.3
>> 4 3 4 0.5
>> 5 4 4 0.6
>> 6 5 5 0.2
>> 7 3 5 0.1
>> 8 4 5 0.0
>>
>>
>>>
>>> Thank you again for your efforts.
>>>
>>> Toby
>>>
>>> On 12 Aug 2010 at 14:32, Marc Schwartz wrote:
>>>
>>>> On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:
>>>>
>>>>> On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
>>>>>
>>>>>> Dear helpeRs,
>>>>>>
>>>>>> I have a dataframe (14947 x 27) containing measurements collected
>>>>>> every 5 seconds at several different sampling locations. If one
>>>>>> measurement at a given location is less than zero on a given
>>>>>> day, I
>>>>>> would like to delete all measurements from that location on that
>>>>>> day.
>>>>>>
>>>>>> Here is a toy example:
>>>>>>
>>>>>> toy <- data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
>>>>>> SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
>>>>>>
>>>>>> In this example, row 9 has a negative measurement for Chamber 5,
>>>>>> so I
>>>>>> would like to delete row 6, which is the same Chamber on the same
>>>>>> day, but not row 3, which is the same chamber on a different
>>>>>> day. In
>>>>>> the full dataframe, there are, of course, many more days.
>>>>>>
>>>>>> Is there a handy R way to do this?
>>>>>>
>>>>>> Thank you for the assistance.
>>>>>>
>>>>>> Toby
>>>>>
>>>>>
>>>>>
>>>>> Not fully tested, but here is one possibility:
>>>>>
>>>>>> toy
>>>>> CH DAY SLOPE
>>>>> 1 3 4 0.2
>>>>> 2 4 4 0.3
>>>>> 3 5 4 0.4
>>>>> 4 3 4 0.5
>>>>> 5 4 4 0.6
>>>>> 6 5 5 0.2
>>>>> 7 3 5 0.1
>>>>> 8 4 5 0.0
>>>>> 9 5 5 -0.1
>>>>>
>>>>>
>>>>>> subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0))
>>>>>> == 0)
>>>>> CH DAY SLOPE
>>>>> 1 3 4 0.2
>>>>> 2 4 4 0.3
>>>>> 3 5 4 0.4
>>>>> 4 3 4 0.5
>>>>> 5 4 4 0.6
>>>>> 7 3 5 0.1
>>>>> 8 4 5 0.0
>>>>
>>>>
>>>> This can actually be slightly shortened to:
>>>>
>>>>> subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0)))
>>>> CH DAY SLOPE
>>>> 1 3 4 0.2
>>>> 2 4 4 0.3
>>>> 3 5 4 0.4
>>>> 4 3 4 0.5
>>>> 5 4 4 0.6
>>>> 7 3 5 0.1
>>>> 8 4 5 0.0
>>>>
>>>>
>>>> HTH,
>>>>
>>>> Marc
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list