[R] Data Extraction

Muhuri, Pradip (SAMHSA/CBHSQ) Pradip.Muhuri at samhsa.hhs.gov
Thu Nov 22 16:50:47 CET 2012


Hi Berend,

You have compared all 3 ways.  ... very nicely evaluated. 

Thanks and regards,

Pradip Muhuri

Beginner UseR

________________________________________
From: Berend Hasselman [bhh at xs4all.nl]
Sent: Thursday, November 22, 2012 9:49 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help at r-project.org
Subject: Re: [R] Data Extraction

On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> I would appreciate if someone could help me resolve the following:
>
> 1. df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work
>
> 2. Is these message harmful?  The following object(s) are masked from 'df1 (position 3)':
>    X1, X2, X3, X4, X5
>
> Thanks,
>
> Pradip Muhuri
>
>
> #Reproducible Example
> set.seed(5)
> df1<-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
> attach (df1)
> #delete rows if any of them NA for X1
> df1[!is.na( X1),][,1:5] # This works
>
> #delete rows if any of them NA for X1, X2, X3, X4 or X5
> df1[!is.na( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

Yet another way of doing this is

df1[!is.na(rowSums(df1)),][1:5]

But Petr's solution appears to be quickest.
See this:

> N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na(x))),][,1:ncol(df)]}
> f2 <- function(df) {df[!is.na(rowSums(df)),][1:ncol(df)]}
> f3 <- function(df) {df[complete.cases(df),][1:ncol(df)]}
>
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), columns=c("test","elapsed", "relative", "replications"))
          test elapsed relative replications
1 d1 <- f1(df)   3.675   13.172          100
2 d2 <- f2(df)   0.401    1.437          100
3 d3 <- f3(df)   0.279    1.000          100

> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE


Berend




More information about the R-help mailing list