[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
@vi@e@gross m@iii@g oii gm@ii@com
@vi@e@gross m@iii@g oii gm@ii@com
Wed Nov 27 19:27:28 CET 2024
John,
If I understood you, you want to take the minimum value of Day for each
grouping by ID and add a new column to contain that. Right?
There are likely many ways to do this in base R, but I prefer the
dplyr/tidyverse package in which you can use group_by(ID) piped to
mutate(FirstDay = min(Day))
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Sorkin, John
Sent: Wednesday, November 27, 2024 11:31 AM
To: r-help using r-project.org (r-help using r-project.org) <r-help using r-project.org>
Subject: [R] R Processing dataframe by group - equivalent to SAS by group
processing with a first. and retain statments
I am an old, long time SAS programmer. I need to produce R code that
processes a dataframe in a manner that is equivalent to that produced by
using a by statement in SAS and an if first.day statement and a retain
statement:
I want to take data (olddata) that looks like this
ID Day
1 1
1 1
1 2
1 2
1 3
1 3
1 4
1 4
1 5
1 5
2 5
2 5
2 5
2 6
2 6
2 6
3 10
3 10
and make it look like this:
(withing each ID I am copying the first value of Day into a new variable,
FirstDay, and propagating the FirstDay value through all rows that have the
same ID:
ID Day FirstDay
1 1 1
1 1 1
1 2 1
1 2 1
1 3 1
1 3 1
1 4 1
1 4 1
1 5 1
1 5 1
2 5 5
2 5 5
2 5 5
2 6 5
2 6 5
2 6 5
3 10 3
3 10 3
SAS code that can do this is:
proc sort data=olddata;
by ID Day;
run;
data newdata;
retain FirstDay;
set olddata;
by ID;
if first.ID then FirstDay=Day;
run;
I have NO idea how to do this is R (so I can't post test-code), but below I
have R code that creates olddata:
ID <- c(rep(1,10),rep(2,6),rep(3,2))
date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
rep(5,3),rep(6,3),rep(10,2))
date
olddata <- data.frame(ID=ID,date=date)
olddata
Any suggestions on how to do this would be appreciated. . . I have worked on
this for more than 12-hours, despite multiple we searches I have gotten
nowhere. . .
Thanks
John
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list