[R] Locate Patients who have multiple high blood pressure readings
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Jan 31 20:20:00 CET 2013
On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
> On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
>
>> Hi,
>>
>>
>>
>> I have a new question about subsetting in R.
>>
>>
>>
>> Say we have this data frame:
>>
>>
>>
>> PT_ID Blood_Pressure OBS_TYPE
>>
>> 92 1900 90.0 DBP
>>
>> 94 1900 90.0 DBP
>>
>> 174 2900 140.0 SBP
>>
>> 176 2900 130.0 SBP
>>
>> 180 3900 120.0 SBP
>>
>> 268 3900 150.0 SBP
>>
>> 268 3900 90.0 DBP
>>
>>
>>
>> I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140.
>>
>>
>>
>> PT_ID=1900, he has 2 DBP>=90, so he will be included.
>>
>> PT_ID=2900, he has 1 SBP>=140, so he will NOT be included.
>>
>> PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be
>> included.
>>
>>
>>
>> So, the condition requires TWO OR MORE values higher than the threshold.
>> It could be either SBP or DBP or both of them.
>>
>>
>>
>> I have tried ddply, but I don’t know how to add the condition 2+ inside
>> ddply.
>>
This can be specified in a reasonably natural fashion using SQL. Here
DF is the input data frame.:
> library(sqldf)
> sqldf("select
+ PT_ID,
+ sum(Blood_Pressure >= 90 and OBS_TYPE == 'DBP') DBP,
+ sum(Blood_Pressure >= 140 and OBS_TYPE == 'SBP') SBP
+ from DF
+ group by PT_ID
+ having DBP >= 2 or SBP >= 2")
PT_ID DBP SBP
1 1900 2 0
More information about the R-help
mailing list