[R] Comparing two columns in an Excel file
arun
smartpink111 at yahoo.com
Wed Aug 14 15:57:19 CEST 2013
Hi,
Try:
set.seed(42)
dat1<- as.data.frame(matrix(sample(LETTERS,2*1e6,replace=TRUE),ncol=2))
dat2<- dat1[1:1e5,]
dat3<- dat1
library(data.table)
dt1<- data.table(dat1)
system.time(dat1$sat<- 1*(dat1[,1]==dat1[,2]))
# user system elapsed
# 0.148 0.004 0.152
library(car)
system.time({dat3$sat<- 1*(dat3[,1]==dat3[,2])
dat3$sat<- recode(dat3$sat,'0="no flag";1="flag"')})
# user system elapsed
# 1.140 0.000 1.137
head(dat3)
# V1 V2 sat
#1 X M no flag
#2 Y K no flag
#3 H W no flag
#4 V E no flag
#5 Q N no flag
#6 N K no flag
#or
system.time(dt1[,sat:=1*(V1==V2)])
# user system elapsed
# 0.104 0.000 0.103
identical(as.data.frame(dt1),dat1)
#[1] TRUE
#your method on a subset of dat1.
na1<- nrow(dat2)
sat <- c(rep(0,na1))
dat2 <- cbind(dat2,sat)
system.time({
for(i in c(1:na1)){
if( dat2[i,1] == dat2[i,2]) {
dat2[i,3] <- 1
}
}
})
# user system elapsed
#18.756 0.000 18.792
identical(dat2,dat1[1:1e5,])
#[1] TRUE
A.K.
Hi,
I have received NGS (next generation sequencing) data in an Excel file and would like to flag columns with synonymous mutations.
The Excel file has 48 columns and my columns of interest are 28th and 31st.
28th and 31st columns contain one letter alphabet (amino acid), and I'd like to flag them if they had the same alphabet.
Below is an example
28th column 31st column sat
S T no flag
A L no flag
K K flag
Here is the code I made and please don't laugh at it. I just started R two weeks ago.
#_______________________________________
a1 <- read.csv(file.choose(),header=TRUE)
na1 <- nrow(a1)
sat <- c(rep(0,na1))
a1[,28] <- as.character(a1[,28])
a1[,31] <- as.character(a1[,31])
a1 <- cbind(a1,sat)
for(i in c(1:na1)){
if( a1[i,28] == a1[i,31]) {
a1[i,49] <- 1
}
}
write.csv(a1,file.choose(), row.names = FALSE)
#_______________________________________
I test-ran this code with a text Excel file with 30 rows without any problem.
But a problem arose when I ran this code with an NGS Excel file with more than 80,000 rows. It ran forever.
Does anybody know how to shorten the running time?
Any input would be appreciated.
Thanks.
SY
More information about the R-help
mailing list