[R] parsing DOB data

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Fri Apr 17 09:15:09 CEST 2020


Hi Peter,
I worked out a neat function to add the century to short dates. It
works fine on its own, but sadly it bombs when used with sapply. Maybe
someone else can point out my mistake:

add_century<-function(x,changeover=68,previous=19,current=20,pos=1,sep="-") {
 xsplit<-unlist(strsplit(x,sep))
 # only add century to short dates
 if(nchar(xsplit[pos]) < 3) {
  century<-ifelse(as.numeric(xsplit[pos]) <= changeover,current,previous)
  xsplit[pos]<-paste0(century,xsplit[[pos]])
 }
 return(paste(xsplit,collapse=sep))
}
# these work
add_century(x3[1],changeover=1,pos=3,sep="/")
add_century(x3[2],changeover=1,pos=3,sep="/")
add_century(x3[3],changeover=1,pos=3,sep="/")
# this doesn't
sapply(x3,add_century,list(changeover=1,pos=3,sep="/"))

Jim

On Fri, Apr 17, 2020 at 11:30 AM Jim Lemon <drjimlemon using gmail.com> wrote:
>
> Hi Peter,
> One way is to process the strings before converting them to dates:
>
> x2<-c("45-12-03","01-06-24","04-9-15","1901-03-04")
> add_century<-function(x,changeover=68,previous=19,current=20) {
>  centuries<-sapply(sapply(x,strsplit,"-"),"[",1)
>  shortyears<-which(!(nchar(centuries)>2))
>  century<-rep("",length(x))
>  century[shortyears]<-ifelse(centuries[shortyears]>changeover,previous,current)
>  newx<-paste0(century,x)
>  return(newx)
> }
> add_century(x2,1)
>
> Jim
>
> On Fri, Apr 17, 2020 at 12:34 AM Peter Nelson via R-help
> <r-help using r-project.org> wrote:
> >
> > I have a data set (.csv) with date (eg date of birth) information stored as character vectors that I’m attempting to transform to POSIXct objects using the package lubridate (1.7.4). The problem that I’m trying to address is that my two digit years are invariably (?) parsed to 20xx. For example,
> >
> > x <- c("45-12-03","01-06-24","64-9-15”)
> > ymd(x)
> > [1] "2045-12-03" "2001-06-24" "2064-09-15”
> >
> > These should be parsed as “1945-12-03” “2001-06-24” “1964-09-15”.
> >
> > I've tried to use parse_date_time()—based on the documentation it looks to me as though the argument cutoff_2000 should allow me to address this, but it’s unclear to me how to implement this. As an example, I’ve tried
> >
> > parse_date_time(x, cutoff_2000 = 01)
> >
> > but get the following error message (and similar for other similar attempts, including cutoff_2000 = 01L)
> >
> > Error in parse_date_time(x, cutoff_2000 = 1) :
> >   unused argument (cutoff_2000 = 1)
> >
> > Thanks for your help!
> >
> > Peter Nelson, PhD
> > Institute of Marine Sciences
> > University of California, Santa Cruz
> > Center for Ocean Health, Long Marine Lab
> > 115 McAllistair Way
> > Santa Cruz, CA, 95076, USA
> > 707-267-5896
> >
> >
> >
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list