[R] Identify first row of each ID within a data frame, create a variable first =1 for the first row and first=0 of all other rows

Sorkin, John j@ork|n @end|ng |rom @om@um@ry|@nd@edu
Sun Dec 1 03:27:29 CET 2024


Dear R help folks,

First my apologizes for sending several related questions to the list server. I am trying to learn how to manipulate data in R . . . and am having difficulty getting my program to work. I greatly appreciate the help and support list member give!

I am trying to write a program that will run through a data frame organized by ID and for the first line of each new group of data lines that has the same ID create a new variable first that will be 1 for the first line of the group and 0 for all other lines.

e.g. if my original data is 
 olddata
   ID date
    1     1
    1     1
    1     2
    1     2
    1     3
    1     3
    1     4
    1     4
    1     5
    1     5
    2     5
    2     5
    2     5
    2     6
    2     6
    2     6
    3   10
    3   10

the new data will be
newdata
   ID date  first
    1     1       1
    1     1       0
    1     2       0
    1     2       0
    1     3       0
    1     3       0
    1     4       0
    1     4       0
    1     5       0
    1     5       0
    2     5       1
    2     5       0
    2     5       0
    2     6       0
    2     6       0
    2     6       0
    3   10       1
    3   10       0

When I run the program below, I receive the following error:
Error in df[, "ID"] : incorrect number of dimensions

My code:
# Create data.frame
ID <- c(rep(1,10),rep(2,6),rep(3,2))
date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
          rep(5,3),rep(6,3),rep(10,2))
olddata <- data.frame(ID=ID,date=date)
class(olddata)
cat("This is the original data frame","\n")
print(olddata)
 
# This function is supposed to identify the first row 
# within each level of ID and, for the first row, set
# the variable first to 1, and for all rows other than
# the first row set first to 0.
mydoit <- function(df){
  value <- ifelse (first(df[,"ID"]),1,0)
  cat("value=",value,"\n")
  df[,"first"] <- value
}
newdata <- aggregate(olddata,list(olddata[,"ID"]),mydoit)

Thank you,
John


John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; 
PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382





More information about the R-help mailing list