[R] Column name containing "-"
Ivan Calandra
ivan.calandra at u-bourgogne.fr
Tue Jan 24 17:25:03 CET 2012
Bert,
Thank you for correcting my inaccuracy. A quick look at the original
question might help you understand what I meant:
d<- data.frame(x = c(0, 1))
d<- data.frame(d, y = c(0,1))
names(d)[2]<- "a.-5"
d
x a.-5
1 0 0
2 1 1
d1<- data.frame(d, y = c(0,1))
d1
x a..5 y
1 0 0 0
2 1 1 1
d2<- data.frame(d, y = c(0,1), check.names=FALSE)
d2
x a.-5 y
1 0 0 0
2 1 1 1
With check.names=TRUE, the dash is converted to a period. With
check.names=FALSE, the dash is conserved. So the dash is not a problem
per se, because data.frame() doesn't throw an error or warning in this case.
Then my question is, why is it converted? To avoid problems with other
functions? To avoid confusion and mischief as you mentioned because it
is the symbol for subtraction? If it can be that problematic, why not
just not allow it at all? I guess there are reasons for these behaviors
and I am curious to learn more about the logic behind it.
Actually, I find that data.frame() can be confusing. On the one hand it
accepts unquoted strings to define column names, like in your first
example. But on the other hand, it doesn't accept it if it can be
confusing like in your second example. I am definitely not experienced
enough to judge whether the behavior makes sense or not, but I am
curious to know why quoted strings are not required in data.frame().
This behavior would be consistent, and therefore easier to understand
for beginners, I think.
Thank you for your insights,
Ivan
Le 24/01/12 16:53, Bert Gunter a écrit :
> Ivan:
>
> On Tue, Jan 24, 2012 at 6:47 AM, Ivan Calandra
> <ivan.calandra at u-bourgogne.fr> wrote:
>> By "it works anyway", I mean that you can have a dash in a column name,
>> there is no error or even warning.
>> I guess that some functions would throw an error or warning, depending on
>> the requirements, but data.frame() doesn't.
> This is false. Please don't guess. Read the Help pages.
>
>> data.frame(a = 1:3) #fine
>> data.frame(a-3 = 1:3) # Error: unexpected '=' in "data.frame(a-3 ="
> The name in **NOT** OK. However,
>> data.frame("a-3" = 1:3) # fine
> a.3
> 1 1
> 2 2
> 3 3
>
> ## A quoted character string can be used as a column name
> ## The name will be changed to a legal name unless:
>
>> data.frame("a-3" = 1:3,check.names=FALSE)
> a-3
> 1 1
> 2 2
> 3 3
>
> However, as is obvious, there is much mischief possible from such
> practices, so that they are best avoided.
>
> -- Bert
>
>
>> Ivan
>>
>> Le 24/01/12 15:35, David Winsemius a écrit :
>>>
>>> On Jan 24, 2012, at 4:44 AM, Ivan Calandra wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> I cannot tell you why (maybe someone else can), but the check.names
>>>> argument to data.frame() interpret "a.-5" as an unvalid name and convert to
>>>> to a valid one. What I don't understand is why it isn't "valid" since it
>>>> works anyway.
>>>
>>> The dash is not a valid character for column names. What do you mean by
>>> "it works anyway"?
>>>
>> --
>> Ivan CALANDRA
>> Université de Bourgogne
>> UMR CNRS/uB 6282 Biogéosciences
>> 6 Boulevard Gabriel
>> 21000 Dijon, FRANCE
>> +33(0)3.80.39.63.06
>> ivan.calandra at u-bourgogne.fr
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
ivan.calandra at u-bourgogne.fr
More information about the R-help
mailing list