[R] extracting values from txt file that follow user-supplied quote
emorway
emorway at usgs.gov
Fri Jun 8 16:30:06 CEST 2012
I'll summarize the results in terms of total run time for the suggestions
that have been made as well as post the code for those that come across this
post in the future. First the results (the code for which is provided
second):
What I tried to do using suggestions from Bert and Dan:
t1
# user system elapsed
# 208.21 1.68 210.34
Gabor's suggested code:
t2
# user system elapsed
# 51.12 0.63 51.75
Rui's suggested code:
t3a #(Get the number of lines)
# user system elapsed
# 45.13 11.08 56.23
t3b #(now perform the function
# user system elapsed
# 50.59 0.55 51.16
So in summary it appears that Gabor's and Rui's code are quite similar (in
terms of runtime) if a priori knowledge of the number of lines in the file
is known (e.g. t2 is roughly equal to t3b). It would seem Gabor's code is a
little more robust since it doesn't require the total number of lines in the
file be supplied. And here is the code used to get these times (note that
the file I used was the 1GB text file, not the reduced version attached to
the top post of this thread):
#----------------
#modified attempt
#----------------
library(gsubfn)
library(tcltk2)
p<-"[-0-9]\\S+"
pd<-numeric()
txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
t1<-system.time(
while (length(txt_line<-readLines(txt_con,n=1))){
if (length(grep("DISCREPANCY = ",txt_line))) {
pd<-c(pd,as.numeric(strapplyc(txt_line, p)[[1]]))
}
})
close(txt_con)
t1
# user system elapsed
# 208.21 1.68 210.34
#----------------------------
#Suggested by G. Grothendieck
#----------------------------
g<-function(txt_con, string, from, to, ...) {
L <- readLines(txt_con)
matched <- grep(string, L, value = TRUE, ...)
as.numeric(substring(matched, from, to))
}
txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
t2<-system.time(
edm<-g(txt_con, "PERCENT DISCREPANCY = ", 70, 78, fixed = TRUE)
)
close(txt_con)
t2
# user system elapsed
# 51.12 0.63 51.75
#-------------------------
#Suggested by Rui Barradas
#-------------------------
library(R.utils)
t3a<-system.time(num_lines<-countLines("D:/MCR_BeoPEST - Copy/MCR.out"))
t3a
# user system elapsed
# 45.13 11.08 56.23
fun <- function(con, pattern, nlines, n=5000L){
if(is.character(con)){
con <- file(con, open="rt")
on.exit(close(con))
}
passes <- nlines %/% n
remaining <- nlines %% n
res <- NULL
for(i in seq_len(passes)){
txt <- readLines(con, n=n)
res <- c(res, as.numeric(substr(txt[grepl(pattern, txt)],
70, 78)))
}
if(remaining){
txt <- readLines(con, n=remaining)
res <- c(res, as.numeric(substr(txt[grepl(pattern, txt)],
70, 78)))
}
res
}
txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
pat<-"PERCENT DISCREPANCY ="
num_lines <- 14405247L
t3b <- system.time(pd2 <- fun(txt_con, pat, num_lines, 100000L))
close(txt_con)
t3b
# user system elapsed
# 50.59 0.55 51.16
--
View this message in context: http://r.789695.n4.nabble.com/extracting-values-from-txt-file-that-follow-user-supplied-quote-tp4632558p4632810.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list