[R] What's data() for?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri May 14 13:59:42 CEST 2010


On Fri, 14 May 2010, Duncan Murdoch wrote:

> On 14/05/2010 5:35 AM, (Ted Harding) wrote:
>> On 13-May-10 23:43:58, yjmha69 wrote:
>> 
>>> Hi there,
>>>
>>> 
>>>> library(faraway)
>>>> pima
>>>>
>>>     pregnant glucose diastolic triceps insulin  bmi diabetes age test
>>> 1          6     148        72      35       0 33.6    0.627  50    1
>>> 2          1      85        66      29       0 26.6    0.351  31    0
>>>
>>> 
>>>> data(pima)
>>>> pima
>>>>
>>>     pregnant glucose diastolic triceps insulin  bmi diabetes age test
>>> 1          6     148        72      35       0 33.6    0.627  50    1
>>> 2          1      85        66      29       0 26.6    0.351  31    0
>>> 
>>> As you can see, I can already use pima without running data(pima),
>>> after running data(pima), it looks the same. So what's the reason to
>>> use data(pima) ?
>>> 
>>> Thanks
>>> YJM
>>> 
>> 
>> The difference is that data(pima) will load the dataset pima
>> (which can be found in the package "faraway") without the use
>> of library(faraway). It won't load anything else from faraway.
>> 
>
> That won't work.  Unless you attach faraway, R won't know what "pima" refers 
> to, and will just give an error.

But

data("pima", package="faraway")

will.  And if you do that you can rm(pima); gc() and completely remove 
the object from the session, something you cannot do with lazy-loading 
of data.

That is I think the main attraction of not using lazy-loading for 
datasets that will be used for only a small part of a session.

> The difference between data(pima) and pima is that, in this case, there isn't 
> really much of one, but in other cases there might be.  Prior to the 
> introduction of lazy loading of data, it always made a difference:  the pima 
> object wouldn't be loaded into memory until requested by data(pima).  With 
> lazy loading, a stub for the object is always in memory, with the main part 
> of the object only loaded on first use.  Many packages (including faraway) 
> use lazy loading of data so data() is to some extent unnecessary:  but there 
> are some circumstances under which lazy loading won't work, so a few packages 
> don't use it, and I believe it is not the default.
>
> Duncan Murdoch
>> When you use library(faraway) you will load everything in the
>> package faraway, including of course the dataset pima (which is
>> why you see no difference, since that dataset is the same whichever
>> way you load it).
>> 
>> So with data() you put less load on your system, and also avoid
>> possible conflicts between what you already have in your environment
>> and what would be brought in when you do library(faraway).
>> 
>> Ted.
>> 
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 14-May-10                                       Time: 10:35:15
>> ------------------------------ XFMail ------------------------------

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list