[R] Logical subset of the columns in a dataframe
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Jan 28 17:24:11 CET 2009
On Wed, 28 Jan 2009, Mark Na wrote:
> Hi R-helpers,
>
> I've been struggling with a problem for most of the day (!) so am finally
> resorting to R-help.
>
> I would like to subset the columns of my dataframe based on the frequency
> with which the columns contain non-zero values. For example, let's say that
> I want to retain only those columns which contain non-zero values in at
> least 1% of their rows.
>
> In Excel I would calculate a row at the bottom of my data sheet and use the
> following function
>
> =countif(range,">0")
>
> to identify the number of non-zero cells in each column. Then, I would
> divide that by the number of rows to obtain the frequency of non-zero values
> in each column. Then, I would delete those columns with frequencies < 0.01.
>
> But, I'd like to do this in R. I think the missing link is an analog to
> Excel's countif function. Any ideas?
Use something like
DF[sapply(DF, function(x) mean(x) >= 0.01)]
Since logical values are converted to 0/1, mean() gives the frequency
(and sum() the count).
>
> Thanks! Mark
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list