[R] Column name containing "-"

R. Michael Weylandt michael.weylandt at gmail.com
Tue Jan 24 17:33:19 CET 2012


I've usually understand the restrictions on syntactic names as being
tied to the parser.

E.g., how could R tell the difference between

d <- data.frame(a = 3, `a-2` = 3, check.names = TRUE)
d$a-2 ## Equal to 1 or 3 ?

One of those strange eval things that makes alot of sense for an
interactive language, but might not be the best for a
formal-programming language (but I don't think it causes any serious
restrictions)

When you force it to be a name, e.g., d$`a-2` then there's no
confusion so it's allowable because it's potentially useful for
formatting output. (One case that comes to mind: delisted stocks are
given tickers that begin with numbers: R wants to stick an X on the
front of the name, but then you loose compatibility with your data
source)

Michael

On Tue, Jan 24, 2012 at 11:25 AM, Ivan Calandra
<ivan.calandra at u-bourgogne.fr> wrote:
> Bert,
>
> Thank you for correcting my inaccuracy. A quick look at the original
> question might help you understand what I meant:
>
> d<- data.frame(x = c(0, 1))
> d<- data.frame(d, y = c(0,1))
> names(d)[2]<- "a.-5"
> d
>  x a.-5
> 1 0    0
> 2 1    1
> d1<- data.frame(d, y = c(0,1))
> d1
>  x a..5 y
> 1 0    0 0
> 2 1    1 1
> d2<- data.frame(d, y = c(0,1), check.names=FALSE)
> d2
>  x a.-5 y
> 1 0    0 0
> 2 1    1 1
>
> With check.names=TRUE, the dash is converted to a period. With
> check.names=FALSE, the dash is conserved. So the dash is not a problem per
> se, because data.frame() doesn't throw an error or warning in this case.
>
> Then my question is, why is it converted? To avoid problems with other
> functions? To avoid confusion and mischief as you mentioned because it is
> the symbol for subtraction? If it can be that problematic, why not just not
> allow it at all? I guess there are reasons for these behaviors and I am
> curious to learn more about the logic behind it.
>
> Actually, I find that data.frame() can be confusing. On the one hand it
> accepts unquoted strings to define column names, like in your first example.
> But on the other hand, it doesn't accept it if it can be confusing like in
> your second example. I am definitely not experienced enough to judge whether
> the behavior makes sense or not, but I am curious to know why quoted strings
> are not required in data.frame(). This behavior would be consistent, and
> therefore easier to understand for beginners, I think.
>
> Thank you for your insights,
> Ivan
>
>
>
> Le 24/01/12 16:53, Bert Gunter a écrit :
>>
>> Ivan:
>>
>> On Tue, Jan 24, 2012 at 6:47 AM, Ivan Calandra
>> <ivan.calandra at u-bourgogne.fr>  wrote:
>>>
>>> By "it works anyway", I mean that you can have a dash in a column name,
>>> there is no error or even warning.
>>> I guess that some functions would throw an error or warning, depending on
>>> the requirements, but data.frame() doesn't.
>>
>> This is false. Please don't guess. Read the Help pages.
>>
>>> data.frame(a = 1:3)  #fine
>>> data.frame(a-3 = 1:3) # Error: unexpected '=' in "data.frame(a-3 ="
>>
>> The name in **NOT** OK. However,
>>>
>>> data.frame("a-3" = 1:3) # fine
>>
>>   a.3
>> 1   1
>> 2   2
>> 3   3
>>
>> ## A quoted  character string can be used as a column name
>> ## The name will be changed to a legal name unless:
>>
>>> data.frame("a-3" = 1:3,check.names=FALSE)
>>
>>   a-3
>> 1   1
>> 2   2
>> 3   3
>>
>> However, as is obvious, there is much mischief possible from such
>> practices, so that they are best avoided.
>>
>> -- Bert
>>
>>
>>> Ivan
>>>
>>> Le 24/01/12 15:35, David Winsemius a écrit :
>>>>
>>>>
>>>> On Jan 24, 2012, at 4:44 AM, Ivan Calandra wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> I cannot tell you why (maybe someone else can), but the check.names
>>>>> argument to data.frame() interpret "a.-5" as an unvalid name and
>>>>> convert to
>>>>> to a valid one. What I don't understand is why it isn't "valid" since
>>>>> it
>>>>> works anyway.
>>>>
>>>>
>>>> The dash is not a valid character for column names. What do you mean by
>>>> "it works anyway"?
>>>>
>>> --
>>> Ivan CALANDRA
>>> Université de Bourgogne
>>> UMR CNRS/uB 6282 Biogéosciences
>>> 6 Boulevard Gabriel
>>> 21000 Dijon, FRANCE
>>> +33(0)3.80.39.63.06
>>> ivan.calandra at u-bourgogne.fr
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> --
> Ivan CALANDRA
> Université de Bourgogne
> UMR CNRS/uB 6282 Biogéosciences
> 6 Boulevard Gabriel
> 21000 Dijon, FRANCE
> +33(0)3.80.39.63.06
> ivan.calandra at u-bourgogne.fr
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list