[R] subsets
Ivan Calandra
ivan.calandra at uni-hamburg.de
Thu Jan 20 14:39:04 CET 2011
Hi Taras,
Indeed, I've overlooked the problem. Anyway, I'm not sure I would have
been able to give a complete answer like you did!
Ivan
Le 1/20/2011 11:05, Taras Zakharko a écrit :
> Hello Den,
>
> your problem is not as it may seem so Ivan's suggestion is only a partial answer. I see that each patient can have
> more then one diagnosis and I take that you want to isolate patients based on particular conditions.
> Thus, simply looking for "ah" or "idh" as Ivan suggests will yield patients which can have either of those but not
> necessarily patients that have both.
>
> Instead, what one must do is apply the condition to the whole set of diagnosis associated with each patient.
> I think that its done best with the aggregate function. This function splits the data according to some
> factor (in our case it will be the patient id) and performs a routine on each subset (in our case it will be
> a condition test):
>
>
> ids<- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x&& "ihd" %in% x)
> ids<- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x&& !"ihd" %in% x)
> ids<- aggregate(diagnosis ~ id, df, function(x) ! "ah" %in% x&& "ihd" %in% x)
>
> Now, ids will contain a data frame like:
>
> id diagnosis
> 1 TRUE
> 2 FALSE
> 3 FALSE
> ...
>
> which shows which patients have the set of diagnoses you asked for. You can then apply these
> patients to the original data by something like:
>
> subset(df, id %in% subset(ids, diagnosis == TRUE)$id)
>
> this will extract only patients from the 'ids' data frame for which the diagnosis applies and then extract the associated
> diagnosis sets from the original 'df' data frame.
>
> Hope it helps,
>
> Taras
> On Jan 20, 2011, at 9:53 , Den wrote:
>
>> Dear R people
>> Could you please help.
>>
>> Basically, there are two variables in my data set. Each patient ('id')
>> may have one or more diseases ('diagnosis'). It looks like
>>
>> id diagnosis
>> 1 ah
>> 2 ah
>> 2 ihd
>> 2 im
>> 3 ah
>> 3 stroke
>> 4 ah
>> 4 ihd
>> 4 angina
>> 5 ihd
>> ..............
>> Q: How to make three data sets:
>> 1. Patients with ah and ihd
>> 2. Patients with ah but no ihd
>> 3. Patients with ihd but no ah?
>>
>> If you have any ideas could just guide what should I look for. Is a
>> subset or aggregate, or loops, or something else??? I am a bit lost. (F1
>> F1 F1 !!!:)
>> Thank you
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de
**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
More information about the R-help
mailing list