[R] [test message] Can R replicate this data manipulation in SAS?
Ted Harding
ted.harding at wlandres.net
Wed Apr 20 22:38:20 CEST 2011
Apologies for troubling the list, but it is a test that
needs to be carried out. I am resending the message that
I sent earlier on behalf of Paul Miller, but with a
certain word used in the variables names of the SAS code
changed to a different word.
With thanks for your tolerance ...
Ted.
[*** PLEASE NOTE: I am sending this message on behalf of
Paul Miller:
Paul Miller <pjmiller_57 at yahoo.com>
(to whom this message has also been copied). He has been
trying to send it, but it has never got through. Please
do not reply to me, but either to the list and/or to Paul
at that address ***]
==========================================================
Hello Everyone,
I'm learning R and am trying to get a better sense of what it
will and will not do. I'm hearing in some places that R may
not be able to accomplish all of the data manipulation tasks
that SAS can. In others, I'm hearing that R can do pretty much
any data manipulation that SAS can but the way in which it
does so is likely to be quite different.
Below is some SAS syntax that that codes Highly Active Antiretroviral
Therapy (HAART) regimens in HIV patients by retaining the values
of variables. Interspersed between the bits of code are printouts
of data sets that are created in the process of coding. I'm hoping
this will come through clearly and that people will be able to see
exactly what is being done. Basically, the code keeps track of how
many books people are on and what types of books they are taking
during specific periods of time and decides whether that constitutes
HAART or not.
To me, this is a pretty tricky data manipulation in SAS. Is there
any way to get the equivalent result in R?
Thanks,
Paul
**** SAS syntax for coding HAART in HIV patients;
**** Read in test data;
data haart;
input id book_class $ start_date :mmddyy. stop_date :mmddyy.;
format start_date stop_date mmddyy8.;
cards;
1004 NRTI 07/24/95 01/05/99
1004 NRTI 11/20/95 12/10/95
1004 NRTI 01/10/96 01/05/99
1004 PI 05/09/96 11/16/97
1004 NRTI 06/01/96 02/01/97
1004 NRTI 07/01/96 03/01/97
9999 PI 01/02/03 .
9999 NNRTI 04/05/06 07/08/09
;
run;
proc print data=haart;
run;
book_ start_ stop_
Obs id class date date
1 1004 NRTI 07/24/95 01/05/99
2 1004 NRTI 11/20/95 12/10/95
3 1004 NRTI 01/10/96 01/05/99
4 1004 PI 05/09/96 11/16/97
5 1004 NRTI 06/01/96 02/01/97
6 1004 NRTI 07/01/96 03/01/97
7 9999 PI 01/02/03 .
8 9999 NNRTI 04/05/06 07/08/09
**** Reshape data into series with 1 date rather than separate starts and
stops;
data changes (drop=start_date stop_date where=(not missing(date)));
set haart;
date = start_date;
change = 1;
output;
date = stop_date;
change = -1;
output;
format date mmddyy10.;
run;
proc sort data=changes;
by id date;
run;
proc print data=changes;
run;
book_
Obs id class date change
1 1004 NRTI 07/24/1995 1
2 1004 NRTI 11/20/1995 1
3 1004 NRTI 12/10/1995 -1
4 1004 NRTI 01/10/1996 1
5 1004 PI 05/09/1996 1
6 1004 NRTI 06/01/1996 1
7 1004 NRTI 07/01/1996 1
8 1004 NRTI 02/01/1997 -1
9 1004 NRTI 03/01/1997 -1
10 1004 PI 11/16/1997 -1
11 1004 NRTI 01/05/1999 -1
12 1004 NRTI 01/05/1999 -1
13 9999 PI 01/02/2003 1
14 9999 NNRTI 04/05/2006 1
15 9999 NNRTI 07/08/2009 -1
**** Get regimen information plus start and stop dates;
data cumulative(drop=book_class change stop_date)
stop_dates(keep=id regimen stop_date);
set changes;
by id date;
if first.id then do;
regimen = 0;
NRTI = 0;
NNRTI = 0;
PI = 0;
end;
if book_class = 'NNRTI' then NNRTI + change;
else if book_class = 'NRTI' then NRTI + change;
else if book_class = 'PI ' then PI + change;
if last.date then do;
stop_date = date - 1;
if regimen then output stop_dates;
regimen + 1;
allbooks = NNRTI + NRTI + PI;
HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR
(NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
(NRTI = 1 AND NNRTI >= 1 AND PI >= 1);
output cumulative;
end;
format stop_date mmddyy10.;
run;
proc print data=cumulative;
run;
Obs id date regimen NRTI NNRTI PI allbooks
HAART
1 1004 07/24/1995 1 1 0 0 1
0
2 1004 11/20/1995 2 2 0 0 2
0
3 1004 12/10/1995 3 1 0 0 1
0
4 1004 01/10/1996 4 2 0 0 2
0
5 1004 05/09/1996 5 2 0 1 3
1
6 1004 06/01/1996 6 3 0 1 4
1
7 1004 07/01/1996 7 4 0 1 5
1
8 1004 02/01/1997 8 3 0 1 4
1
9 1004 03/01/1997 9 2 0 1 3
1
10 1004 11/16/1997 10 2 0 0 2
0
11 1004 01/05/1999 11 0 0 0 0
0
12 9999 01/02/2003 1 0 0 1 1
0
13 9999 04/05/2006 2 0 1 1 2
0
14 9999 07/08/2009 3 0 0 1 1
0
proc print data=stop_dates;
run;
Obs id regimen stop_date
1 1004 1 11/19/1995
2 1004 2 12/09/1995
3 1004 3 01/09/1996
4 1004 4 05/08/1996
5 1004 5 05/31/1996
6 1004 6 06/30/1996
7 1004 7 01/31/1997
8 1004 8 02/28/1997
9 1004 9 11/15/1997
10 1004 10 01/04/1999
11 9999 1 04/04/2006
12 9999 2 07/07/2009
**** Merge data to create regimens dataset;
data regimens;
retain id start_date stop_date;
merge cumulative(rename=(date=start_date)) stop_dates;
by id regimen;
if allbooks;
run;
proc print data=regimens;
run;
Obs id start_date stop_date regimen NRTI NNRTI PI
allbooks HAART
1 1004 07/24/1995 11/19/1995 1 1 0 0
1 0
2 1004 11/20/1995 12/09/1995 2 2 0 0
2 0
3 1004 12/10/1995 01/09/1996 3 1 0 0
1 0
4 1004 01/10/1996 05/08/1996 4 2 0 0
2 0
5 1004 05/09/1996 05/31/1996 5 2 0 1
3 1
6 1004 06/01/1996 06/30/1996 6 3 0 1
4 1
7 1004 07/01/1996 01/31/1997 7 4 0 1
5 1
8 1004 02/01/1997 02/28/1997 8 3 0 1
4 1
9 1004 03/01/1997 11/15/1997 9 2 0 1
3 1
10 1004 11/16/1997 01/04/1999 10 2 0 0
2 0
11 9999 01/02/2003 04/04/2006 1 0 0 1
1 0
12 9999 04/05/2006 07/07/2009 2 0 1 1
2 0
13 9999 07/08/2009 . 3 0 0 1
1 0
==========================================================
Paul Miller
Paul Miller <pjmiller_57 at yahoo.com>
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Apr-11 Time: 19:59:21
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list