[R] Batch replacement, by factor, of values in a data frame

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Aug 26 19:37:12 CEST 2009


On Wed, 2009-08-26 at 08:06 -0700, Phil Spector wrote:
> The ave function is very handy for things like this:
> 
> mins = ave(D$Var,D$Site,FUN=function(x)min(x[x>0],na.rm=TRUE))
> D$Var = ifelse(is.na(D$Var) | D$Var == 0,mins,D$Var)
> 
> should do the required replacements.

Thanks Phil, that's great. I hadn't come across ave() before.

Shortly after I received your email, it dawned on me to just rep the
minimums the required number of times (by the number of non-missing 0's)
for each site, and then insert this vector into the data frame whilst
subsetting on whether it was zero or not.

Cheers,

G

> 
>  					- Phil Spector
>  					 Statistical Computing Facility
>  					 Department of Statistics
>  					 UC Berkeley
>  					 spector at stat.berkeley.edu
> 
> 
> On Wed, 26 Aug 2009, Gavin Simpson wrote:
> 
> > Dear List,
> >
> > I'm wondering if there is a better/cleaner/more efficient way of
> > replacing 0 values in a variable with the minimum of the non-missing and
> > non-zero values of that same variable, but doing it within the levels of
> > a factor?
> >
> > Consider the dummy example data presented at the end of my message.
> > Within each 'Site' there are some 0 values and possibly some NA's. I can
> > compute the minimum of the non-missing and non-zero values by 'Site' as
> > indicated below using aggregate for example. Save for looping over the
> > 'Site's and replacing 0's with the relevant minimum is there a way of
> > using a vectorised approach to do the replacement?
> >
> > Thanks in advance,
> >
> > G
> >
> > ## dummy data
> > set.seed(123)
> > D <- data.frame(Site = factor(rep(LETTERS[1:5], times = 10)),
> >                Var = runif(5*10))
> > D <- D[with(D, order(Site, Var)), ]
> > ## simulate some 0's
> > D[c(1,3,11,12,23,27,34,36,41,49), "Var"] <- 0
> > ## just to complicate matters, some NA
> > D[sample(NROW(D), 3), "Var"] <- NA
> > head(D)
> > ## Compute minimums per Site
> > aggregate(D$Var, by = list(Site = D$Site),
> >          FUN = function(x) min(x[x>0], na.rm = TRUE))
> > ## How replace the appropriate 0's with the appropriate minimum?
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> > ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> > Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> > Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> > UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list