[R] Can R replicate this data manipulation in SAS?

Ista Zahn izahn at psych.rochester.edu
Wed Apr 20 23:53:18 CEST 2011


Oops, I missed the HAART part. Fortunately that translates straightforwardly:

n.dat$HAART <- with(n.dat, ifelse((NRTI >= 3 & NNRTI==0 & PI==0) |
                                  (NRTI >= 2 & (NNRTI >= 1 | PI >= 1)) |
                                  (NRTI == 1 & NNRTI >= 1 & PI >= 1),
                                  1, 0))

Best,
Ista

On Wed, Apr 20, 2011 at 5:22 PM, Ista Zahn <izahn at psych.rochester.edu> wrote:
> I think this is kind of like asking "will your Land Rover make it up
> my driveway?", but I'll assume the question was asked in all
> seriousness.
>
> Here is one solution:
>
> ## **** Read in test data;
> dat <- read.table(textConnection("id    drug      start       stop
> 1004    NRTI     07/24/95    01/05/99
> 1004    NRTI     11/20/95 12/10/95
> 1004    NRTI     01/10/96    01/05/99
> 1004    PI       05/09/96    11/16/97
> 1004    NRTI     06/01/96    02/01/97
> 1004    NRTI     07/01/96    03/01/97
> 9999    PI       01/02/03    NA
> 9999    NNRTI    04/05/06    07/08/09"), header=TRUE)
> closeAllConnections()
>
> dat$start <- as.Date(dat$start, format = "%m/%d/%y")
> dat$stop <- as.Date(dat$stop, format = "%m/%d/%y")
>
> ## **** Reshape data into series with 1 date rather than separate starts and
> ## stops;
>
> library(reshape)
>
> m.dat <- melt(dat, id = c("id", "drug"))
> m.dat <- m.dat[order(m.dat$id, m.dat$value),]
> m.dat$variable <- ifelse(m.dat$variable == "start", 1, -1)
> names(m.dat) <-  c("id", "drug", "value", "date")
> m.dat
>
> ## **** Get regimen information plus start and stop dates;
>
> n.dat <- cast(m.dat, id + date ~ drug, fun.aggregate=sum, margins="grand_col")
> for (i in names(n.dat)[-c(1:2)]) {
>     n.dat[i] <- cumsum(n.dat[i])
>   }
> n.dat <- ddply(n.dat, .(id), transform,
>      regimen = 1:length(id))
> n.dat
>
> ssd.dat <- ddply(n.dat, .(id), summarize,
>                id = id[-1],
>                regimen = regimen[-length(regimen)],
>                 start_date = date[-length(date)],
>                stop_date = date[-1])
> ssd.dat
>
> ## **** Merge data to create regimens dataset;
> all.dat <- merge(n.dat[-2], ssd.dat)
> all.dat <- all.dat[order(all.dat$id, all.dat$regimen), c("id",
> "start_date", "stop_date", "regimen", "NRTI", "NNRTI", "PI",
> "X.all.")]
> all.dat
>
>
> Best,
> Ista
>
>
>
> On Wed, Apr 20, 2011 at 2:59 PM, Ted Harding <ted.harding at wlandres.net> wrote:
>> [*** PLEASE NOTE: I am sending this message on behalf of
>>  Paul Miller:
>>  Paul Miller <pjmiller_57 at yahoo.com>
>>  (to whom this message has also been copied). He has been
>>  trying to send it, but it has never got through. Please
>>  do  not reply to me, but either to the list and/or to Paul
>>  at that address ***]
>> ==========================================================
>> Hello Everyone,
>>
>> I'm learning R and am trying to get a better sense of what it will and
>> will not
>> do. I'm hearing in some places that R may not be able to accomplish all
>> of the
>> data manipulation tasks that SAS can. In others, I'm hearing that R can do
>> pretty much any data manipulation that SAS can but the way in which it
>> does so
>> is likely to be quite different.
>>
>> Below is some SAS syntax that that codes Highly Active Antiretroviral
>> Therapy
>> (HAART) regimens in HIV patients by retaining the values of variables.
>> Interspersed between the bits of code are printouts of data sets that are
>> created in the process of coding. I'm hoping this will come through
>> clearly and
>> that people will be able to see exactly what is being done. Basically,
>> the code
>> keeps track of how many drugs people are on and what types of drugs they
>> are
>> taking during specific periods of time and decides whether that
>> constitutes
>> HAART or not.
>>
>> To me, this is a pretty tricky data manipulation in SAS. Is there any way
>> to
>> get the equivalent result in R?
>>
>> Thanks,
>>
>> Paul
>>
>>
>> **** SAS syntax for coding HAART in HIV patients;
>> **** Read in test data;
>>
>> data haart;
>> input id drug_class $ start_date :mmddyy. stop_date :mmddyy.;
>> format start_date stop_date mmddyy8.;
>> cards;
>> 1004 NRTI  07/24/95 01/05/99
>> 1004 NRTI  11/20/95 12/10/95
>> 1004 NRTI  01/10/96 01/05/99
>> 1004 PI    05/09/96 11/16/97
>> 1004 NRTI  06/01/96 02/01/97
>> 1004 NRTI  07/01/96 03/01/97
>> 9999 PI    01/02/03 .
>> 9999 NNRTI 04/05/06 07/08/09
>> ;
>> run;
>>
>> proc print data=haart;
>> run;
>>
>>               drug_      start_       stop_
>> Obs     id     class        date        date
>> 1     1004    NRTI     07/24/95    01/05/99
>> 2     1004    NRTI     11/20/95 12/10/95
>> 3     1004    NRTI     01/10/96    01/05/99
>> 4     1004    PI       05/09/96    11/16/97
>> 5     1004    NRTI     06/01/96    02/01/97
>> 6     1004    NRTI     07/01/96    03/01/97
>> 7     9999    PI       01/02/03           .
>> 8     9999    NNRTI    04/05/06    07/08/09
>>
>> **** Reshape data into series with 1 date rather than separate starts and
>> stops;
>>
>> data changes (drop=start_date stop_date where=(not missing(date)));
>> set haart;
>> date = start_date;
>> change =  1;
>> output;
>> date =  stop_date;
>> change = -1;
>> output;
>> format date mmddyy10.;
>> run;
>>
>> proc sort data=changes;
>> by id date;
>> run;
>>
>> proc print data=changes;
>> run;
>>
>>               drug_
>> Obs     id     class          date    change
>>  1    1004    NRTI     07/24/1995       1
>>  2    1004    NRTI     11/20/1995       1
>>  3    1004    NRTI     12/10/1995      -1
>>  4    1004    NRTI     01/10/1996       1
>>  5    1004    PI       05/09/1996       1
>>  6    1004    NRTI     06/01/1996       1
>>  7    1004    NRTI     07/01/1996       1
>>  8    1004    NRTI     02/01/1997      -1
>>  9    1004    NRTI     03/01/1997      -1
>> 10    1004    PI       11/16/1997      -1
>> 11    1004    NRTI     01/05/1999      -1
>> 12    1004    NRTI     01/05/1999      -1
>> 13    9999    PI       01/02/2003       1
>> 14    9999    NNRTI    04/05/2006       1
>> 15    9999    NNRTI    07/08/2009      -1
>>
>> **** Get regimen information plus start and stop dates;
>>
>> data cumulative(drop=drug_class change stop_date)
>>     stop_dates(keep=id regimen stop_date);
>> set changes;
>> by id date;
>>
>> if first.id then do;
>>  regimen = 0;
>>  NRTI = 0;
>>  NNRTI = 0;
>>  PI = 0;
>> end;
>>
>> if drug_class = 'NNRTI' then NNRTI + change;
>> else if drug_class = 'NRTI' then NRTI + change;
>> else if drug_class = 'PI  ' then PI + change;
>>
>> if last.date then do;
>>  stop_date = date - 1;
>> if regimen then output stop_dates;
>>   regimen + 1;
>>  alldrugs = NNRTI + NRTI + PI;
>>  HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR
>>    (NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
>>    (NRTI = 1 AND NNRTI >= 1 AND PI >= 1);
>> output cumulative;
>> end;
>>
>> format stop_date mmddyy10.;
>> run;
>>
>> proc print data=cumulative;
>> run;
>> Obs     id           date    regimen    NRTI    NNRTI    PI    alldrugs
>>  HAART
>>  1    1004    07/24/1995        1        1       0       0        1
>>   0
>>  2    1004    11/20/1995        2        2       0       0        2
>>   0
>>  3    1004    12/10/1995        3        1       0       0        1
>>   0
>>  4    1004    01/10/1996        4        2       0       0        2
>>   0
>>  5    1004    05/09/1996        5        2       0       1        3
>>   1
>>  6    1004    06/01/1996        6        3       0       1        4
>>   1
>>  7    1004    07/01/1996        7        4       0       1        5
>>   1
>>  8    1004    02/01/1997        8        3       0       1        4
>>   1
>>  9    1004    03/01/1997        9        2       0       1        3
>>   1
>> 10    1004    11/16/1997       10        2       0       0        2
>>  0
>> 11    1004    01/05/1999       11        0       0       0        0
>>  0
>> 12    9999    01/02/2003        1        0       0       1        1
>>  0
>> 13    9999    04/05/2006        2        0       1       1        2
>>  0
>> 14    9999    07/08/2009        3        0       0       1        1
>>  0
>>
>> proc print data=stop_dates;
>> run;
>>
>> Obs     id     regimen     stop_date
>>  1    1004        1      11/19/1995
>>  2    1004        2      12/09/1995
>>  3    1004        3      01/09/1996
>>  4    1004        4      05/08/1996
>>  5    1004        5      05/31/1996
>>  6    1004        6      06/30/1996
>>  7    1004        7      01/31/1997
>>  8    1004        8      02/28/1997
>>  9    1004        9      11/15/1997
>> 10    1004       10      01/04/1999
>> 11    9999        1      04/04/2006
>> 12    9999        2      07/07/2009
>>
>> **** Merge data to create regimens dataset;
>>
>> data regimens;
>> retain id start_date stop_date;
>> merge cumulative(rename=(date=start_date)) stop_dates;
>> by id regimen;
>> if alldrugs;
>> run;
>>
>> proc print data=regimens;
>> run;
>>
>> Obs     id     start_date     stop_date    regimen    NRTI    NNRTI    PI
>>
>> alldrugs    HAART
>>  1    1004    07/24/1995    11/19/1995        1        1       0       0
>>
>>  1         0
>>  2    1004    11/20/1995    12/09/1995        2        2       0       0
>>
>>  2         0
>>  3    1004    12/10/1995    01/09/1996        3        1       0       0
>>
>>  1         0
>>  4    1004    01/10/1996    05/08/1996        4        2       0       0
>>
>>  2         0
>>  5    1004    05/09/1996    05/31/1996        5        2       0       1
>>
>>  3         1
>>  6    1004    06/01/1996    06/30/1996        6        3       0       1
>>
>>  4         1
>>  7    1004    07/01/1996    01/31/1997        7        4       0       1
>>
>>  5         1
>>  8    1004    02/01/1997    02/28/1997        8        3       0       1
>>
>>  4         1
>>  9    1004    03/01/1997    11/15/1997        9        2       0       1
>>
>>  3         1
>> 10    1004    11/16/1997    01/04/1999       10        2       0       0
>>
>> 2         0
>> 11    9999    01/02/2003    04/04/2006        1        0       0       1
>>
>> 1         0
>> 12    9999    04/05/2006    07/07/2009        2        0       1       1
>>
>> 2         0
>> 13    9999    07/08/2009             .        3        0       0       1
>>
>> 1         0
>>
>> ==========================================================
>>
>> Paul Miller
>> Paul Miller <pjmiller_57 at yahoo.com>
>>
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <ted.harding at wlandres.net>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 20-Apr-11                                       Time: 19:59:21
>> ------------------------------ XFMail ------------------------------
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list