[R] Column name containing "-"
Bert Gunter
gunter.berton at gene.com
Tue Jan 24 17:54:50 CET 2012
Ivan:
AFAICS you did not read my post carefully enough. Unquoted improper
identifiers trigger an error because the input cannot be parsed. It
has nothing to do with data.frame() .(**R EXPERTS, PLEASE CORRECT IF
WRONG**). Unquoted LEGAL names are accepted because that's how R works
-- they can be properly parsed. So I don't understand your confusion.
And the behavior of data.frame() with check.names is clearly documented.
-- Bert
On Tue, Jan 24, 2012 at 8:25 AM, Ivan Calandra
<ivan.calandra at u-bourgogne.fr> wrote:
> Bert,
>
> Thank you for correcting my inaccuracy. A quick look at the original
> question might help you understand what I meant:
>
> d<- data.frame(x = c(0, 1))
> d<- data.frame(d, y = c(0,1))
> names(d)[2]<- "a.-5"
> d
> x a.-5
> 1 0 0
> 2 1 1
> d1<- data.frame(d, y = c(0,1))
> d1
> x a..5 y
> 1 0 0 0
> 2 1 1 1
> d2<- data.frame(d, y = c(0,1), check.names=FALSE)
> d2
> x a.-5 y
> 1 0 0 0
> 2 1 1 1
>
> With check.names=TRUE, the dash is converted to a period. With
> check.names=FALSE, the dash is conserved. So the dash is not a problem per
> se, because data.frame() doesn't throw an error or warning in this case.
>
> Then my question is, why is it converted? To avoid problems with other
> functions? To avoid confusion and mischief as you mentioned because it is
> the symbol for subtraction? If it can be that problematic, why not just not
> allow it at all? I guess there are reasons for these behaviors and I am
> curious to learn more about the logic behind it.
>
> Actually, I find that data.frame() can be confusing. On the one hand it
> accepts unquoted strings to define column names, like in your first example.
> But on the other hand, it doesn't accept it if it can be confusing like in
> your second example. I am definitely not experienced enough to judge whether
> the behavior makes sense or not, but I am curious to know why quoted strings
> are not required in data.frame(). This behavior would be consistent, and
> therefore easier to understand for beginners, I think.
>
> Thank you for your insights,
> Ivan
>
>
>
> Le 24/01/12 16:53, Bert Gunter a écrit :
>>
>> Ivan:
>>
>> On Tue, Jan 24, 2012 at 6:47 AM, Ivan Calandra
>> <ivan.calandra at u-bourgogne.fr> wrote:
>>>
>>> By "it works anyway", I mean that you can have a dash in a column name,
>>> there is no error or even warning.
>>> I guess that some functions would throw an error or warning, depending on
>>> the requirements, but data.frame() doesn't.
>>
>> This is false. Please don't guess. Read the Help pages.
>>
>>> data.frame(a = 1:3) #fine
>>> data.frame(a-3 = 1:3) # Error: unexpected '=' in "data.frame(a-3 ="
>>
>> The name in **NOT** OK. However,
>>>
>>> data.frame("a-3" = 1:3) # fine
>>
>> a.3
>> 1 1
>> 2 2
>> 3 3
>>
>> ## A quoted character string can be used as a column name
>> ## The name will be changed to a legal name unless:
>>
>>> data.frame("a-3" = 1:3,check.names=FALSE)
>>
>> a-3
>> 1 1
>> 2 2
>> 3 3
>>
>> However, as is obvious, there is much mischief possible from such
>> practices, so that they are best avoided.
>>
>> -- Bert
>>
>>
>>> Ivan
>>>
>>> Le 24/01/12 15:35, David Winsemius a écrit :
>>>>
>>>>
>>>> On Jan 24, 2012, at 4:44 AM, Ivan Calandra wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> I cannot tell you why (maybe someone else can), but the check.names
>>>>> argument to data.frame() interpret "a.-5" as an unvalid name and
>>>>> convert to
>>>>> to a valid one. What I don't understand is why it isn't "valid" since
>>>>> it
>>>>> works anyway.
>>>>
>>>>
>>>> The dash is not a valid character for column names. What do you mean by
>>>> "it works anyway"?
>>>>
>>> --
>>> Ivan CALANDRA
>>> Université de Bourgogne
>>> UMR CNRS/uB 6282 Biogéosciences
>>> 6 Boulevard Gabriel
>>> 21000 Dijon, FRANCE
>>> +33(0)3.80.39.63.06
>>> ivan.calandra at u-bourgogne.fr
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> --
> Ivan CALANDRA
> Université de Bourgogne
> UMR CNRS/uB 6282 Biogéosciences
> 6 Boulevard Gabriel
> 21000 Dijon, FRANCE
> +33(0)3.80.39.63.06
> ivan.calandra at u-bourgogne.fr
>
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list