[R] how to remove rows in which 2 or more observations are smaller than a given threshold?

hind lazrak hindstata at gmail.com
Sun Feb 27 02:14:11 CET 2011


Dear Bill and Phil

Many thanks for your help, your solutions worked perfectly.
Bill: I did not specify whether the data was a matrix or dataframe
because it is in fact the Expression file in an eset object (bioBase).

Thank you so much again!

Hind
On Sat, Feb 26, 2011 at 4:34 PM, William Dunlap <wdunlap at tibco.com> wrote:
> You didn't say if your data set was a matrix or data.frame.
> Here are 2 functions that do the job on either and one that
> only works with data.frames, but is faster (a similar speedup
> is available for matrices as well).  They all compute the
> number of small values in each row, nSmall, and extract the
> rows for which nSmall is less than 2.
>
> f0 <- function (x) {
>   nSmall <- apply(x, 1, function(row) sum(abs(row) <= 1.58)
>   x[nSmall<2, , drop = FALSE]
> }
> f1 <- function (x) {
>   nSmall<- rowSums(abs(x) < 1.58)
>   x[nSmall<2, , drop = FALSE]
> }
> f2 <- function (x) {
>    stopifnot(is.data.frame(x))
>    nSmall <- 0
>    for (column in x) {
>        nSmall <- nSmall + (abs(column) < 1.58)
>    }
>    x[nSmall < 2, , drop = FALSE]
> }
>
> For a 10^5 row by 50 column data.frame I got the
> following times:
>  > system.time(r0 <- f0(z))
>     user  system elapsed
>     2.39    0.04    2.51
>  > system.time(r1 <- f1(z))
>     user  system elapsed
>     0.42    0.08    0.51
>  > system.time(r2 <- f2(z))
>     user  system elapsed
>     0.21    0.05    0.24
>  > identical(r0, r1) && identical(r0, r2)
>  [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of hind lazrak
>> Sent: Saturday, February 26, 2011 3:37 PM
>> To: r-help at r-project.org
>> Subject: [R] how to remove rows in which 2 or more
>> observations are smaller than a given threshold?
>>
>> Hello
>>
>> The data set I am examining has 7425 observations (rows with unique
>> identifiers) and 46 samples(columns).
>>
>> I have been trying to generate a dataset that filters out observations
>> that are "negligible"
>> The definition of "negligible" is absolute value less or
>> equal  to 1.58.
>>
>> The rule that I would like to adopt to create a new data is: drop rows
>> in which 2 or more observations have absolute values <= 1.58.
>>
>> Since I have unique identifier per row, I have tried to reshape the
>> data so I could create a new variable using an ifelse statement that
>> would flag observations <=1.58 but I am not getting anywhere with this
>> approach
>>
>> I could not come up with an apply function that counts the number of
>> observations for which the absolute values are below the cutoff I've
>> specified.
>>
>> All observations are numerical and  I don't have missing values.
>>
>>
>> Thank you in advance for the help,
>>
>> Hind
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list