[R] Data Extraction

Thu Nov 22 16:45:06 CET 2012

Hi Bert,

Your solution is similar to Petr's.

Thanks and regards,

Pradip Muhuri
BeginneR UseR

________________________________________
From: Bert Gunter [gunter.berton at gene.com]
Sent: Thursday, November 22, 2012 10:20 AM
To: Berend Hasselman
Cc: Muhuri, Pradip (SAMHSA/CBHSQ); r-help at r-project.org
Subject: Re: [R] Data Extraction

Unnecessarily complicated. ?na.omit (linked from ?complete.cases)

df <- na.omit(df)

-- Bert

On Thu, Nov 22, 2012 at 6:49 AM, Berend Hasselman <bhh at xs4all.nl<mailto:bhh at xs4all.nl>> wrote:

On 22-11-2012, at 15:11, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> I would appreciate if someone could help me resolve the following:
>
> 1. df1[!is.na<http://is.na>( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work
>
> 2. Is these message harmful?  The following object(s) are masked from 'df1 (position 3)':
>    X1, X2, X3, X4, X5
>
> Thanks,
>
> Pradip Muhuri
>
>
> #Reproducible Example
> set.seed(5)
> df1<-data.frame(matrix(sample(c(1:10,NA),100,replace=TRUE),ncol=5))
> attach (df1)
> #delete rows if any of them NA for X1
> df1[!is.na<http://is.na>( X1),][,1:5] # This works
>
> #delete rows if any of them NA for X1, X2, X3, X4 or X5
> df1[!is.na<http://is.na>( X1 | X2 | X3 | X4 | X5),][,1:5] # This does not work

Yet another way of doing this is

df1[!is.na<http://is.na>(rowSums(df1)),][1:5]

But Petr's solution appears to be quickest.
See this:

> N <- 100000
> set.seed(13)
> df <- data.frame(matrix(sample(c(1:10,NA),N,replace=TRUE),ncol=50))
> library(rbenchmark)
>
> f1 <- function(df) {df[apply(df, 1, function(x)all(!is.na<http://is.na>(x))),][,1:ncol(df)]}
> f2 <- function(df) {df[!is.na<http://is.na>(rowSums(df)),][1:ncol(df)]}
> f3 <- function(df) {df[complete.cases(df),][1:ncol(df)]}
>
> benchmark(d1 <- f1(df), d2 <- f2(df), d3 <- f3(df), columns=c("test","elapsed", "relative", "replications"))
          test elapsed relative replications
1 d1 <- f1(df)   3.675   13.172          100
2 d2 <- f2(df)   0.401    1.437          100
3 d3 <- f3(df)   0.279    1.000          100

> identical(d1,d2)
[1] TRUE
> identical(d1,d3)
[1] TRUE

Berend

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm