[R] indexing question

Thomas Lumley tlumley at u.washington.edu
Wed Jan 14 08:23:32 CET 2009


There are real examples; they are all fairly obscure.  It can't be a big problem because the standard formal argument name for a data frame in modelling and graphics functions is 'data'.  That's actually a more serious problem than the function called data() -- having local and global variables with the same name won't confuse R, but it can easily confuse you.

Possibilities for R getting confused include
  1. The functions for environment access by name, eg exists(), get(), don't by default check the type of the argument.
  2. bquote() and substitute() substitute before evaluating and could get confused.

There used to be real problems in S when certain function names were used as data names.  Then there was a period of aversive conditioning by irritating warnings. As a result, I still avoid 'c' and 't' as variable names.

You could call your data frames 'df' -- many of the people who complain about 'data' don't realise that df() is the density function of the F distribution :)

      -thomas



On Tue, 13 Jan 2009, Ista Zahn wrote:

> On Tue, Jan 13, 2009 at 10:23 AM, jim holtman <jholtman at gmail.com> wrote:
>
>> How about this:
>>
>>> data(ToothGrowth)
>>> ls()
>> [1] "ToothGrowth"
>>> data <- function(x){invisible(NULL)}
>>> data(ToothGrowth)
>>> ls()
>> [1] "data"
>>>
>>
> Yep, that sure does cause a problem alright. Is it the case that that
> problems arise when you name a function with the same name as an existing
> function? Or are there cases where naming data.frames, vectors, matrices,
> etc. can also cause problems?
>
> I hope I'm not being annoying -- I'm just trying to determine if I need to
> break my habit of naming data.frames "data".
>
> Thanks,
> Ista
>
>
>>
>>
>> On Tue, Jan 13, 2009 at 9:53 AM, Ista Zahn <istazahn at gmail.com> wrote:
>>> From: baptiste auguie <ba208 at exeter.ac.uk>
>>> To: Dimitris Rizopoulos <d.rizopoulos at erasmusmc.nl>
>>> Date: Tue, 13 Jan 2009 09:38:09 +0000
>>> Subject: Re: [R] indexing question
>>>
>>>> you can also look at subset,
>>>>
>>>>
>>>>        my.data.frame <- data.frame(a=rnorm(10),
>>>>> b=factor(sample(letters[1:4], 10, replace=T)))
>>>>>        str(my.data.frame)
>>>>>        my.data.frame[my.data.frame$b == "a", ]
>>>>>        subset(my.data.frame, b == "a")
>>>>>
>>>>
>>>> by the way, it is probably safer not to use "data" as a variable name as
>> it
>>>> is also a function.
>>>>
>>>
>>> I've often wondered about this. The thing is, I've never run into a
>> problem
>>> with this. For example:
>>>
>>>> ls()
>>> character(0)
>>>> data(ToothGrowth)
>>>> ls()
>>> [1] "ToothGrowth"
>>>> rm(ToothGrowth)
>>>> ls()
>>> character(0)
>>>> data <- data.frame(1:10, 101:110)
>>>> data(ToothGrowth) #works just the same
>>>> ls()
>>> [1] "data"        "ToothGrowth"
>>>>
>>>
>>> In this example the data command works just the same the second time,
>> even
>>> though I have a data.frame named data. Can someone give an example where
>>> this causes a problem?
>>>
>>> Thanks,
>>> Ista
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list