[R] How to combine character month and year columns into one column

Marc Schwartz marc_schwartz at me.com
Tue Sep 23 19:18:48 CEST 2014


Two things:

1. You need to convert the result of the paste() to a Date related class.

2. R's standard Date classes require a full date, so you would have to add in some default day of the month:

See ?as.Date

NewDate <- as.Date(paste(month.abb[as.numeric(ddf$month)], "01", ddf$Year, sep="-"), 
                   format = "%b-%d-%Y")

or without using month.abb, which is not really needed. Note the difference in the format argument:

NewDate <- as.Date(paste(as.numeric(ddf$month), "01", ddf$Year, sep="-"), 
                   format = "%m-%d-%Y")

> class(NewDate)
[1] "Date"

> str(NewDate)
 Date[1:12], format: "1999-01-01" "1999-02-01" "1999-03-01" "1999-04-01" ...


You can then format the output of NewDate as you might require:

> format(NewDate, format = "%b-%d-%Y")
 [1] "Jan-01-1999" "Feb-01-1999" "Mar-01-1999" "Apr-01-1999"
 [5] "May-01-1999" "Jun-01-1999" "Jul-01-1999" "Aug-01-1999"
 [9] "Sep-01-1999" "Oct-01-1999" "Nov-01-1999" "Dec-01-1999"


Note that the output of the last step is a character vector:

> str(format(NewDate, format = "%b-%d-%Y"))
 chr [1:12] "Jan-01-1999" "Feb-01-1999" "Mar-01-1999" ...

which is fine for formatting/printing, even though NewDate is a Date class object.


Alternatively, I believe that Gabor's 'zoo' package on CRAN has a 'yearmon' class for this type of partial date.

Regards,

Marc


On Sep 23, 2014, at 12:04 PM, Kuma Raj <pollaroid at gmail.com> wrote:

> Many thanks for your quick answer which has created what I wished. May
> I ask followup question on the same issue. I failed to convert the new
> column into date format with this code. The class of MonthDay is still
> character
> 
> df$MonthDay <- format(df$MonthDay, format=c("%b %Y"))
> I would appreciate if you could suggest a working solution
> Thanks
> 
> 
> On 23 September 2014 18:03, Marc Schwartz <marc_schwartz at me.com> wrote:
>> On Sep 23, 2014, at 10:41 AM, Kuma Raj <pollaroid at gmail.com> wrote:
>> 
>>> Dear R users,
>>> 
>>> I have a data with  month and year columns which are both characters
>>> and wanted to create a new column like Jan-1999
>>> with the following code. The result is all NA for the month part. What
>>> is wrong with the and what is the right way to combine the two?
>>> 
>>> ddf$MonthDay <- paste(month.abb[ddf$month], ddf$Year, sep="-" )
>>> 
>>> 
>>> Thanks
>>> 
>>>> dput(ddf)
>>> structure(list(month = c("01", "02", "03", "04", "05", "06",
>>> "07", "08", "09", "10", "11", "12"), Year = c("1999", "1999",
>>> "1999", "1999", "1999", "1999", "1999", "1999", "1999", "1999",
>>> "1999", "1999"), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
>>> 39, 38, 46), MonthDay = c("NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999")), .Names = c("month", "Year", "views",
>>> "MonthDay"), row.names = 109:120, class = "data.frame")
>>>> 
>>> 
>> 
>> 
>> 
>> Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created.
>> 
>> In the case of the former approach:
>> 
>>> paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep="-" )
>> [1] "Jan-1999" "Feb-1999" "Mar-1999" "Apr-1999" "May-1999" "Jun-1999"
>> [7] "Jul-1999" "Aug-1999" "Sep-1999" "Oct-1999" "Nov-1999" "Dec-1999"
>> 
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 



More information about the R-help mailing list