[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

Olivier Crouzet o||v|er@crouzet @end|ng |rom un|v-n@nte@@|r
Wed Nov 27 18:13:29 CET 2024


Dear John,

Considering that you've got the following dataframe:

ID <- c(rep(1,10),rep(2,6),rep(3,2))
date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
          rep(5,3),rep(6,3),rep(10,2))
df <- data.frame(ID, date)

I would suggest to go this way:

newdf <- df %>% dplyr::group_by(ID) %>% dplyr::mutate(firstday =
first(date))

Which produces:

> newdf
# A tibble: 18 × 3
# Groups:   ID [3]
      ID  date   ave
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     1     1     1
 3     1     2     1
 4     1     2     1
 5     1     3     1
 6     1     3     1
 7     1     4     1
 8     1     4     1
 9     1     5     1
10     1     5     1
11     2     5     5
12     2     5     5
13     2     5     5
14     2     6     5
15     2     6     5
16     2     6     5
17     3    10    10
18     3    10    10


I think it does what you need
Olivier.


Tom Woolman <twoolman using ontargettek.com> wrote:

> Check out the dplyr package, specifically the mutate function.
> 
> # Create new column based on existing column value 
> 
> df <- df %>% mutate(FirstDay = if(ID = 2, 5))
> 
> df
> 
> 
> 
> Repeat as needed to capture all of the day/firstday combinations you
> want to account for.
> 
> Like everything else in R, there are probably at least a dozen other
> ways to do this, between base R and all of the library packages
> available.
> 
> 
> 
> 
> On Wednesday, November 27th, 2024 at 11:30 AM, Sorkin, John
> <jsorkin using som.umaryland.edu> wrote:
> 
> > 
> > 
> > I am an old, long time SAS programmer. I need to produce R code
> > that processes a dataframe in a manner that is equivalent to that
> > produced by using a by statement in SAS and an if first.day
> > statement and a retain statement:
> > 
> > I want to take data (olddata) that looks like this
> > ID Day
> > 1 1
> > 1 1
> > 1 2
> > 1 2
> > 1 3
> > 1 3
> > 1 4
> > 1 4
> > 1 5
> > 1 5
> > 2 5
> > 2 5
> > 2 5
> > 2 6
> > 2 6
> > 2 6
> > 3 10
> > 3 10
> > 
> > and make it look like this:
> > (withing each ID I am copying the first value of Day into a new
> > variable, FirstDay, and propagating the FirstDay value through all
> > rows that have the same ID:
> > 
> > ID Day FirstDay
> > 1 1 1
> > 1 1 1
> > 1 2 1
> > 1 2 1
> > 1 3 1
> > 1 3 1
> > 1 4 1
> > 1 4 1
> > 1 5 1
> > 1 5 1
> > 2 5 5
> > 2 5 5
> > 2 5 5
> > 2 6 5
> > 2 6 5
> > 2 6 5
> > 3 10 3
> > 3 10 3
> > 
> > SAS code that can do this is:
> > 
> > proc sort data=olddata;
> > by ID Day;
> > run;
> > 
> > data newdata;
> > retain FirstDay;
> > set olddata;
> > by ID;
> > if first.ID then FirstDay=Day;
> > run;
> > 
> > I have NO idea how to do this is R (so I can't post test-code), but
> > below I have R code that creates olddata:
> > 
> > ID <- c(rep(1,10),rep(2,6),rep(3,2))
> > date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
> > rep(5,3),rep(6,3),rep(10,2))
> > date
> > olddata <- data.frame(ID=ID,date=date)
> > olddata
> > 
> > Any suggestions on how to do this would be appreciated. . . I have
> > worked on this for more than 12-hours, despite multiple we searches
> > I have gotten nowhere. . .
> > 
> > Thanks
> > John
> > 
> > 
> > 
> > 
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine, University of Maryland School of Medicine;
> > Associate Director for Biostatistics and Informatics, Baltimore VA
> > Medical Center Geriatrics Research, Education, and Clinical Center;
> > PI Biostatistics and Informatics Core, University of Maryland
> > School of Medicine Claude D. Pepper Older Americans Independence
> > Center; Senior Statistician University of Maryland Center for
> > Vascular Research;
> > 
> > Division of Gerontology and Paliative Care,
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > Cell phone 443-418-5382
> > 
> > 
> > 
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > https://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.


-- 
  Olivier Crouzet, PhD
  http://olivier.ghostinthemachine.space
  /Maître de Conférences/
  @LLING - Laboratoire de Linguistique de Nantes
    UMR6310 CNRS / Université de Nantes



More information about the R-help mailing list