[R] Deleting specific rows from a dataframe
arun
smartpink111 at yahoo.com
Tue Jul 16 05:00:54 CEST 2013
You mentioned data.frame at one place and matrix at another. Matrix would be faster.
#Speed comparison
#Speed
set.seed(1454)
dfTest<- as.data.frame(matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5))
system.time(res<-dfTest[rowSums(dfTest=="P")==ncol(dfTest),])
# user system elapsed
# 0.628 0.020 0.649
dim(res)
#[1] 952 5
set.seed(1454)
mat1<- matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5)
system.time(res1<-mat1[rowSums(mat1=="P")==ncol(mat1),])
# user system elapsed
# 0.188 0.004 0.194
dim(res1)
#[1] 952 5
#Other options include
system.time(res3<- dfTest[apply(sweep(dfTest,1,"P","=="),1,all),])
# user system elapsed
# 5.988 0.120 6.120
identical(res,res3)
#[1] TRUE
system.time(res2<- dfTest[apply(dfTest,1, function(x) all(length(table(x))==ncol(dfTest) | names(table(x))=="P") ), ])
# user system elapsed
#351.492 0.040 352.164
row.names(res2)<- row.names(res3)
attr(res3,"row.names")<- attr(res2,"row.names")
identical(res2,res3)
#[1] TRUE
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Chirag Gupta <cxg040 at email.uark.edu>
Cc: R help <r-help at r-project.org>
Sent: Monday, July 15, 2013 9:23 PM
Subject: Re: [R] Deleting specific rows from a dataframe
Hi,
If I understand it correctly,
df1<- read.table(text="
sample1 sample2 sample3 sample4 sample5
a P P I P P
b P A P P A
c P P P P P
d P P P P P
e M P M A P
f P P P P P
g P P P A P
h P P P P P
",sep="",header=TRUE,stringsAsFactors=FALSE)
df1[rowSums(df1=="P")==ncol(df1),]
# sample1 sample2 sample3 sample4 sample5
#c P P P P P
#d P P P P P
#f P P P P P
#h P P P P P
A.K.
----- Original Message -----
From: Chirag Gupta <cxg040 at email.uark.edu>
To: r-help at r-project.org
Cc:
Sent: Monday, July 15, 2013 9:10 PM
Subject: [R] Deleting specific rows from a dataframe
I have a data frame like shown below
sample1 sample2 sample3 sample4 sample5 a P P I P P b P A P P A c P P P
P P d P P P P P e M P M A P f P P P P P g P P P A P h P P P P P
I want to keep only those rows which have all "P" across all the columns.
Since the matrix is large (about 20,000 rows), I cannot do it in excel
Any special function that i can use?
--
*Chirag Gupta*
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list