[R] How to combine character month and year columns into one column
Marc Schwartz
marc_schwartz at me.com
Tue Sep 23 19:18:48 CEST 2014
Two things:
1. You need to convert the result of the paste() to a Date related class.
2. R's standard Date classes require a full date, so you would have to add in some default day of the month:
See ?as.Date
NewDate <- as.Date(paste(month.abb[as.numeric(ddf$month)], "01", ddf$Year, sep="-"),
format = "%b-%d-%Y")
or without using month.abb, which is not really needed. Note the difference in the format argument:
NewDate <- as.Date(paste(as.numeric(ddf$month), "01", ddf$Year, sep="-"),
format = "%m-%d-%Y")
> class(NewDate)
[1] "Date"
> str(NewDate)
Date[1:12], format: "1999-01-01" "1999-02-01" "1999-03-01" "1999-04-01" ...
You can then format the output of NewDate as you might require:
> format(NewDate, format = "%b-%d-%Y")
[1] "Jan-01-1999" "Feb-01-1999" "Mar-01-1999" "Apr-01-1999"
[5] "May-01-1999" "Jun-01-1999" "Jul-01-1999" "Aug-01-1999"
[9] "Sep-01-1999" "Oct-01-1999" "Nov-01-1999" "Dec-01-1999"
Note that the output of the last step is a character vector:
> str(format(NewDate, format = "%b-%d-%Y"))
chr [1:12] "Jan-01-1999" "Feb-01-1999" "Mar-01-1999" ...
which is fine for formatting/printing, even though NewDate is a Date class object.
Alternatively, I believe that Gabor's 'zoo' package on CRAN has a 'yearmon' class for this type of partial date.
Regards,
Marc
On Sep 23, 2014, at 12:04 PM, Kuma Raj <pollaroid at gmail.com> wrote:
> Many thanks for your quick answer which has created what I wished. May
> I ask followup question on the same issue. I failed to convert the new
> column into date format with this code. The class of MonthDay is still
> character
>
> df$MonthDay <- format(df$MonthDay, format=c("%b %Y"))
> I would appreciate if you could suggest a working solution
> Thanks
>
>
> On 23 September 2014 18:03, Marc Schwartz <marc_schwartz at me.com> wrote:
>> On Sep 23, 2014, at 10:41 AM, Kuma Raj <pollaroid at gmail.com> wrote:
>>
>>> Dear R users,
>>>
>>> I have a data with month and year columns which are both characters
>>> and wanted to create a new column like Jan-1999
>>> with the following code. The result is all NA for the month part. What
>>> is wrong with the and what is the right way to combine the two?
>>>
>>> ddf$MonthDay <- paste(month.abb[ddf$month], ddf$Year, sep="-" )
>>>
>>>
>>> Thanks
>>>
>>>> dput(ddf)
>>> structure(list(month = c("01", "02", "03", "04", "05", "06",
>>> "07", "08", "09", "10", "11", "12"), Year = c("1999", "1999",
>>> "1999", "1999", "1999", "1999", "1999", "1999", "1999", "1999",
>>> "1999", "1999"), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
>>> 39, 38, 46), MonthDay = c("NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999")), .Names = c("month", "Year", "views",
>>> "MonthDay"), row.names = 109:120, class = "data.frame")
>>>>
>>>
>>
>>
>>
>> Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created.
>>
>> In the case of the former approach:
>>
>>> paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep="-" )
>> [1] "Jan-1999" "Feb-1999" "Mar-1999" "Apr-1999" "May-1999" "Jun-1999"
>> [7] "Jul-1999" "Aug-1999" "Sep-1999" "Oct-1999" "Nov-1999" "Dec-1999"
>>
>>
>> Regards,
>>
>> Marc Schwartz
>>
More information about the R-help
mailing list