[R] Column name containing "-"

Ivan Calandra ivan.calandra at u-bourgogne.fr
Tue Jan 24 17:25:03 CET 2012


Bert,

Thank you for correcting my inaccuracy. A quick look at the original 
question might help you understand what I meant:

d<- data.frame(x = c(0, 1))
d<- data.frame(d, y = c(0,1))
names(d)[2]<- "a.-5"
d
  x a.-5
1 0    0
2 1    1
d1<- data.frame(d, y = c(0,1))
d1
  x a..5 y
1 0    0 0
2 1    1 1
d2<- data.frame(d, y = c(0,1), check.names=FALSE)
d2
   x a.-5 y
1 0    0 0
2 1    1 1

With check.names=TRUE, the dash is converted to a period. With 
check.names=FALSE, the dash is conserved. So the dash is not a problem 
per se, because data.frame() doesn't throw an error or warning in this case.

Then my question is, why is it converted? To avoid problems with other 
functions? To avoid confusion and mischief as you mentioned because it 
is the symbol for subtraction? If it can be that problematic, why not 
just not allow it at all? I guess there are reasons for these behaviors 
and I am curious to learn more about the logic behind it.

Actually, I find that data.frame() can be confusing. On the one hand it 
accepts unquoted strings to define column names, like in your first 
example. But on the other hand, it doesn't accept it if it can be 
confusing like in your second example. I am definitely not experienced 
enough to judge whether the behavior makes sense or not, but I am 
curious to know why quoted strings are not required in data.frame(). 
This behavior would be consistent, and therefore easier to understand 
for beginners, I think.

Thank you for your insights,
Ivan



Le 24/01/12 16:53, Bert Gunter a écrit :
> Ivan:
>
> On Tue, Jan 24, 2012 at 6:47 AM, Ivan Calandra
> <ivan.calandra at u-bourgogne.fr>  wrote:
>> By "it works anyway", I mean that you can have a dash in a column name,
>> there is no error or even warning.
>> I guess that some functions would throw an error or warning, depending on
>> the requirements, but data.frame() doesn't.
> This is false. Please don't guess. Read the Help pages.
>
>> data.frame(a = 1:3)  #fine
>> data.frame(a-3 = 1:3) # Error: unexpected '=' in "data.frame(a-3 ="
> The name in **NOT** OK. However,
>> data.frame("a-3" = 1:3) # fine
>    a.3
> 1   1
> 2   2
> 3   3
>
> ## A quoted  character string can be used as a column name
> ## The name will be changed to a legal name unless:
>
>> data.frame("a-3" = 1:3,check.names=FALSE)
>    a-3
> 1   1
> 2   2
> 3   3
>
> However, as is obvious, there is much mischief possible from such
> practices, so that they are best avoided.
>
> -- Bert
>
>
>> Ivan
>>
>> Le 24/01/12 15:35, David Winsemius a écrit :
>>>
>>> On Jan 24, 2012, at 4:44 AM, Ivan Calandra wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> I cannot tell you why (maybe someone else can), but the check.names
>>>> argument to data.frame() interpret "a.-5" as an unvalid name and convert to
>>>> to a valid one. What I don't understand is why it isn't "valid" since it
>>>> works anyway.
>>>
>>> The dash is not a valid character for column names. What do you mean by
>>> "it works anyway"?
>>>
>> --
>> Ivan CALANDRA
>> Université de Bourgogne
>> UMR CNRS/uB 6282 Biogéosciences
>> 6 Boulevard Gabriel
>> 21000 Dijon, FRANCE
>> +33(0)3.80.39.63.06
>> ivan.calandra at u-bourgogne.fr
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
ivan.calandra at u-bourgogne.fr



More information about the R-help mailing list