[R] Cacheing in read.table/ attached data?

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Nov 21 08:41:32 CET 2005


When you re-create a data frame, you do not change the attached version.
This is not caching, but the documented behaviour of attach().  The help 
page says

      The database is not actually attached.  Rather, a new environment
      is created on the search path and the elements of a list
      (including columns of a dataframe) or objects in a save file are
      _copied_ into the new environment.

However, the code you give does not actually attach SG before using it, so 
I guess you did not run exactly this code.  (You also need 
library(fBasics).)

Try using

with(SG, circlesPlot(A10Holdings,A3Yr, size=NetAssets))

Then if you want to drop points, use some means (e.g. identify() on a 
normal plot) to find which they are, put their indices in variable 'drop' 
and use

with(SG[-drop, ], circlesPlot(A10Holdings,A3Yr, size=NetAssets))


On Sun, 20 Nov 2005, Vivek Satsangi wrote:

> Disclaimer/Apology: I am an R newbie
>
> I am seeing some behaviour that seems to me to be the result of some
> cacheing going on at some level, and perhaps this is expected behaviour. I
> would just like to understand the basic rules.
>
> What I have is a file with some data. I read it in and then do a summary on
> the resulting dataframe. I find the some values are completely outside the
> expected range, these value need to be dropped from further analysis as
> erroneous observations (yes, I apologize to the purists in advance :-) ).
>
> If I do this and read the file again, then circlesPlot (from fBasics) two of
> the columns in the data, then the plot is not updated. The outlier point is
> still there. However, when I detach and reattach the dataframe, it seems to
> work okay. For example,
> # Plot has the outlier point in it.
> # Edit the file, commenting out the outlier line, save, then...
>> SG <- read.table
> ("c:/Vivek/MFC/Data/SG/combinedSG.tdf",header=TRUE,sep="\t")
>> SGm2 <- lm(A3Yr ~ A10Holdings, data=SG)
>> circlesPlot(A10Holdings,A3Yr, size=NetAssets)
>> abline(coef(SGm2)) # Put the regression line on the plot
>> SG <- read.table
> ("c:/Vivek/MFC/Data/SG/combinedSG.tdf",header=TRUE,sep="\t")
>> summary(SG) #Outlier does not show in the summary
>> circlesPlot(A10Holdings,A3Yr, size=NetAssets) # ... But Plot still has the
> outlier
>> detach(SG)
>> attach(SG)
>> circlesPlot(A10Holdings,A3Yr, size=NetAssets) # Outlier is gone from the
> plot
>
> So, here are my questions:
> 1. Is there a simpler / more idiomatic way in R, than commenting out the
> data in the data file to exclude some outliers in the data (i.e. to do data
> trimming). In EViews this is done by setting the sample.
> 2. Is the "flushing" of the cache happening as a result of the
> detach/attach, or some other reason?
>
> Thanks for any help,
>
> Vivek Satsangi
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

PLEASE do: not HTML is requested.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list