[R] How to combine character month and year columns into one column
Marc Schwartz
marc_schwartz at me.com
Tue Sep 23 20:38:39 CEST 2014
Hi David,
My initial reaction (not that the decision is mine to make), is that from a technical perspective, obviously indexing by name is common.
There are two considerations, off the top of my head:
1. There would be a difference, of course, between:
> month.abb["1"]
<NA>
NA
and
> month.abb["01"]
01
"Jan"
Thus, is this approach overly fragile and potentially going to create more problems (bugs, head scratching, etc.) than it solves.
2. From a consistency standpoint, I don't see an indication that other built-in constants have similar name attributes, not that I did an exhaustive review. So I suspect that if there were reasonable justification for it here, it would also need to at least be considered for other constants, which increases the scope of work a good bit.
If there is a desire for this, one could file an RFE at https://bugs.r-project.org to gauge the reactions from R Core, unless they comment here first.
Regards,
Marc
On Sep 23, 2014, at 12:47 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> Marc;
>
> Feature request:
>
> Would it make sense to construct month.abb as a named vector so that the operation that was attempted would have succeeded? Adding alphanumeric names c("01", "02", "03", "04", "05", "06",
> "07", "08", "09", "10", "11", "12") would allow character extraction from substring or regex extracted month values which are always character-class.
>
> Example:
>
>> names(month.abb) <- c("01", "02", "03", "04", "05", "06",
> + "07", "08", "09", "10", "11", "12")
>> month.abb
> 01 02 03 04 05 06 07 08 09 10 11 12
> "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
>
>
>> month.abb[ substr(Sys.Date(), 6,7) ]
> 09
> "Sep"
>
> --
> David.
>
> On Sep 23, 2014, at 9:03 AM, Marc Schwartz wrote:
>
>> On Sep 23, 2014, at 10:41 AM, Kuma Raj <pollaroid at gmail.com> wrote:
>>
>>> Dear R users,
>>>
>>> I have a data with month and year columns which are both characters
>>> and wanted to create a new column like Jan-1999
>>> with the following code. The result is all NA for the month part. What
>>> is wrong with the and what is the right way to combine the two?
>>>
>>> ddf$MonthDay <- paste(month.abb[ddf$month], ddf$Year, sep="-" )
>>>
>>>
>>> Thanks
>>>
>>>> dput(ddf)
>>> structure(list(month = c("01", "02", "03", "04", "05", "06",
>>> "07", "08", "09", "10", "11", "12"), Year = c("1999", "1999",
>>> "1999", "1999", "1999", "1999", "1999", "1999", "1999", "1999",
>>> "1999", "1999"), views = c(42, 49, 44, 38, 37, 35, 38, 39, 38,
>>> 39, 38, 46), MonthDay = c("NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999", "NA-1999",
>>> "NA-1999", "NA-1999")), .Names = c("month", "Year", "views",
>>> "MonthDay"), row.names = 109:120, class = "data.frame")
>>>>
>>>
>>
>>
>>
>> Since you are trying to use ddf$month as an index into month.abb, you will either need to coerce ddf$month to numeric in your code, or adjust how the data frame is created.
>>
>> In the case of the former approach:
>>
>>> paste(month.abb[as.numeric(ddf$month)], ddf$Year, sep="-" )
>> [1] "Jan-1999" "Feb-1999" "Mar-1999" "Apr-1999" "May-1999" "Jun-1999"
>> [7] "Jul-1999" "Aug-1999" "Sep-1999" "Oct-1999" "Nov-1999" "Dec-1999"
>>
>>
>> Regards,
>>
>> Marc Schwartz
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
More information about the R-help
mailing list