[R] conditional selection of dataframe rows
David Winsemius
dwinsemius at comcast.net
Fri Aug 13 00:39:59 CEST 2010
On Aug 12, 2010, at 6:15 PM, David Winsemius wrote:
>
> On Aug 12, 2010, at 5:20 PM, Toby Gass wrote:
>
>> Hi,
>>
>> I do want to look only at slope.
>> If there is one negative slope measurement for a given day and a
>> given chamber, I would like to remove all other slope measurements
>> for that day and that chamber, even if they are positive.
>>
>> On one day, I will have 20 slope measurements for each chamber. If
>> one is negative, I would like to delete the other 19 for that chamber
>> on that day, even if they are positive. I have measurements for
>> every day of the year, for 4 years and multiple chambers.
>>
>> I know I could make some awful nested loop with a vector of day and
>> chamber numbers for each occurrence of a negative slope and then run
>> that against the whole data set but I hope not to have to do that.
>>
>> Here is the rationale, if that helps. These are unattended outdoor
>> chambers that measure soil carbon efflux. When the numbers go
>> negative during part of the day but otherwise look normal, it usually
>> means a plant has sprouted in the chamber and is using the carbon
>> dioxide. That means the measurements are all lower than they should
>> be and I need to discard all measurements collected on that day,
>> whether positive or negative.
>>
>> It might have been a little clearer if I'd make the toy dataframe a
>> bit larger.
>
> I think the fault was all mine. Failure to read for meaning. Here's
> an alternate strategy, although I think Schwartz's might be cleaner:
>
> > toy$ch.day.cat <- with(toy, paste(CH, DAY, sep="."))
> > negs.idxs <- tapply(toy$SLOPE , toy$ch.day.cat, function (x) any(x
> <0) )
> > negs.idxs
> 3.4 3.5 4.4 4.5 5.4 5.5
> FALSE FALSE FALSE FALSE FALSE TRUE
> > toy[-which(negs.idxs), ]
> CH DAY SLOPE ch.day.cat
> 1 3 4 0.2 3.4
> 2 4 4 0.3 4.4
> 3 5 4 0.4 5.4
> 4 3 4 0.5 3.4
> 5 4 4 0.6 4.4
> 7 3 5 0.1 3.5
> 8 4 5 0.0 4.5
> 9 5 5 -0.1 5.5
>
I think I should give up today. I saw that the above code eliminates
#6 and only after posting saw that #9 was left in:
require(rms) # for %nin% .. or use the %w/o% operator defined on
match help page:
> toy[toy$ch.day.cat %nin% names(negs.idxs[negs.idxs]), ]
CH DAY SLOPE ch.day.cat
1 3 4 0.2 3.4
2 4 4 0.3 4.4
3 5 4 0.4 5.4
4 3 4 0.5 3.4
5 4 4 0.6 4.4
7 3 5 0.1 3.5
8 4 5 0.0 4.5
Now I am really sure that the ave( , , any) strategy is superior.
> --
> David
>>
>> Thanks again for the assistance.
>>
>> Toby
>>
>>
>>
>> On 12 Aug 2010 at 16:39, David Winsemius wrote:
>>
>>>
>>> On Aug 12, 2010, at 4:06 PM, Toby Gass wrote:
>>>
>>>> Thank you all for the quick responses. So far as I've checked,
>>>> Marc's solution works perfectly and is quite speedy. I'm still
>>>> trying to figure out what it is doing. :)
>>>>
>>>> Henrique's solution seems to need some columns somewhere. David's
>>>> solution does not find all the other measurements, possibly with
>>>> positive values, taken on the same day.
>>>
>>> I assumed you only wanted to look at what appeared to be a data
>>> column, SLOPE. If you want to look at all columns for negatives then
>>> try:
>>>
>>> toy[ which( apply(toy, 1, function(x) all(x >= 0)) ), ] # or
>>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>>
>>> This is how they differ w,r,t, their handling of NA's.
>>>
>>>> toy[3,2] <- NA
>>>> toy[ apply(toy, 1, function(x) all(x >= 0)) , ]
>>> CH DAY SLOPE
>>> 1 3 4 0.2
>>> 2 4 4 0.3
>>> NA NA NA NA
>>> 4 3 4 0.5
>>> 5 4 4 0.6
>>> 6 5 5 0.2
>>> 7 3 5 0.1
>>> 8 4 5 0.0
>>>> toy[ which(apply(toy, 1, function(x) all(x >= 0)) ), ]
>>> CH DAY SLOPE
>>> 1 3 4 0.2
>>> 2 4 4 0.3
>>> 4 3 4 0.5
>>> 5 4 4 0.6
>>> 6 5 5 0.2
>>> 7 3 5 0.1
>>> 8 4 5 0.0
>>>
>>>
>>>>
>>>> Thank you again for your efforts.
>>>>
>>>> Toby
>>>>
>>>> On 12 Aug 2010 at 14:32, Marc Schwartz wrote:
>>>>
>>>>> On Aug 12, 2010, at 2:24 PM, Marc Schwartz wrote:
>>>>>
>>>>>> On Aug 12, 2010, at 2:11 PM, Toby Gass wrote:
>>>>>>
>>>>>>> Dear helpeRs,
>>>>>>>
>>>>>>> I have a dataframe (14947 x 27) containing measurements
>>>>>>> collected
>>>>>>> every 5 seconds at several different sampling locations. If one
>>>>>>> measurement at a given location is less than zero on a given
>>>>>>> day, I
>>>>>>> would like to delete all measurements from that location on that
>>>>>>> day.
>>>>>>>
>>>>>>> Here is a toy example:
>>>>>>>
>>>>>>> toy <- data.frame(CH = rep(3:5,3), DAY = c(rep(4,5), rep(5,4)),
>>>>>>> SLOPE = c(seq(0.2,0.6, .1),seq(0.2, -0.1, -0.1)))
>>>>>>>
>>>>>>> In this example, row 9 has a negative measurement for Chamber 5,
>>>>>>> so I
>>>>>>> would like to delete row 6, which is the same Chamber on the
>>>>>>> same
>>>>>>> day, but not row 3, which is the same chamber on a different
>>>>>>> day. In
>>>>>>> the full dataframe, there are, of course, many more days.
>>>>>>>
>>>>>>> Is there a handy R way to do this?
>>>>>>>
>>>>>>> Thank you for the assistance.
>>>>>>>
>>>>>>> Toby
>>>>>>
>>>>>>
>>>>>>
>>>>>> Not fully tested, but here is one possibility:
>>>>>>
>>>>>>> toy
>>>>>> CH DAY SLOPE
>>>>>> 1 3 4 0.2
>>>>>> 2 4 4 0.3
>>>>>> 3 5 4 0.4
>>>>>> 4 3 4 0.5
>>>>>> 5 4 4 0.6
>>>>>> 6 5 5 0.2
>>>>>> 7 3 5 0.1
>>>>>> 8 4 5 0.0
>>>>>> 9 5 5 -0.1
>>>>>>
>>>>>>
>>>>>>> subset(toy, ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0))
>>>>>>> == 0)
>>>>>> CH DAY SLOPE
>>>>>> 1 3 4 0.2
>>>>>> 2 4 4 0.3
>>>>>> 3 5 4 0.4
>>>>>> 4 3 4 0.5
>>>>>> 5 4 4 0.6
>>>>>> 7 3 5 0.1
>>>>>> 8 4 5 0.0
>>>>>
>>>>>
>>>>> This can actually be slightly shortened to:
>>>>>
>>>>>> subset(toy, !ave(SLOPE, CH, DAY, FUN = function(x) any(x < 0)))
>>>>> CH DAY SLOPE
>>>>> 1 3 4 0.2
>>>>> 2 4 4 0.3
>>>>> 3 5 4 0.4
>>>>> 4 3 4 0.5
>>>>> 5 4 4 0.6
>>>>> 7 3 5 0.1
>>>>> 8 4 5 0.0
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>> Marc
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list