[R] Narrowing values collected from .txt file
Morway, Eric
emorway at usgs.gov
Wed Aug 28 20:28:11 CEST 2013
A relatively concise, commented, working solution to the problem
originally motivating
this thread was found (below). I suspect the approach I've taken has a
major inefficiency through the use of the "scan" statement appearing inside
the function "g". The way the code works right now, it has to re-open and
read the file 'length(matched) times' rather than sequentially reading
through to the next pertinent section of the txt file. Does anyone have a
more efficient approach in mind so I don't have to wait 1/2 hour to get the
results? (The only adjustment to the code that follows is to point "txt" to
wherever the attached file is placed)
#The file that the code works on is attached as: MCR_Budgets.zip (76MB
uncompressed)
# where is the file? (original dat file is ~147MB, only half of this file
is attached)
txt<-"c:/temp/MCR_Budgets.txt"
# Demarcation header to narrow list of retrieved 'Recharge' values
hdr_str<-"Flow Budget for Zone 2"
# string to identify lines with desired values
srch_str<-" RECHARGE ="
# retrieves desired values
g<-function(txt_con, hdr_str, srch_str, from, to, ...) {
L <- readLines(txt_con)
#matched contains the line #s w/ hdr_str
matched <- grep(hdr_str, L, value = FALSE, ...)
#initialize output list
fetched_list<-numeric()
#for each instance of hdr_str, loop
for(i in 1:(length(matched))){
#retrieve a section of text following each hdr_str, suspect this
is highly
inefficient!!!
snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42,
sep='\n')
#get the two lines containing 'srch_str' within the short
section of retrieved
text
fetched <- grep(srch_str, snippet, value=TRUE)
#append output vector for plotting time series
fetched_list <- c(fetched_list, as.numeric(substring(fetched, from,
to)))
#monitor
print(i)
}
#return desired values
as.numeric(fetched_list)
}
#The results of system.time reflect the fact the function was run on the full
147 MB file,
# only half of which is attached.
system.time(
rech_z2<-g(txt,hdr_str,srch_str,37,51)
)
# user system elapsed
#1740.48 36.08 1825.77
On Wed, Aug 21, 2013 at 6:50 AM, Morway, Eric
>
> The output generated from a groundwater model post-processor contains
> millions of lines of text. Using the custom R function shown below, I
> can quickly gather values from this file.
>
> As you can see in the textConnection provided below (which is only a
> small snippet from the file), the output is repetitive but does have some
> header lines I hope to make use of to narrow the collected output. The
> header lines I'm speaking of are:
> 1) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 2"
> 2) " Flow Budget for Zone 2 at Time Step 1 of Stress Period 2"
> 3) " Flow Budget for Zone 3 at Time Step 1 of Stress Period 2"
> 4) " Flow Budget for Zone 1 at Time Step 1 of Stress Period 3"
> ...
>
> and so on for 111 different "zones" as well as 575 distinct "stress
> periods". In the custom function that follows, currently named "g", I can
> collect all values of "Recharge". If instead I want to restrict the
> collected "Recharge" values to "Zone 2" for all 575 stress periods, is
> there a way to first look for the header "Flow Budget for Zone 2", collect
> only the next two values of Recharge, and then skip down to the next
> header containing "Zone 2", collect 2 more values of "Recharge", and on
> like this to the end? 'Peeling' out targeted flow budget terms will
> facilitate generation of budget-specific plots through time.
>
> The "edm" variable at the end of the R code that follows currently looks
> like this:
> edm
> # [1] 1.28980e+05 0.00000e+00 *2.74161e-01* 0.00000e+00 8.10840e+04
> 0.00000e+00
> # [7] 1.28980e+05 0.00000e+00 *2.74165e-01* 0.00000e+00 8.10840e+04
> 0.00000e+00
>
> but with the proposed revision, which only collects Recharge values from
> Zone 2, it would look like:
> edm
> # [1] *2.74161e-01* 0.00000e+00 *2.74165e-01* 0.00000e+00
>
>
> txt_con<-textConnection(" mark_zone
>
>
> Flow Budget for Zone 1 at Time Step 1 of Stress Period 2
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.37855E-02
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.12898E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 16 to 1 = 0.0000
> Zone 31 to 1 = 0.0000
> Zone 40 to 1 = 0.0000
> Zone 91 to 1 = 0.0000
>
> Total IN = 0.12898E+06
>
> OUT:
> ----
> STORAGE = 0.58275E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 1 to 16 = 399.88
> Zone 1 to 31 = 85204.
> Zone 1 to 40 = 12404.
> Zone 1 to 91 = 30968.
>
> Total OUT = 0.12898E+06
>
> IN - OUT = 0.14138E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 2 at Time Step 1 of Stress Period 2
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.18833E-05
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.274161E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 15 to 2 = 0.0000
> Zone 31 to 2 = 0.0000
> Zone 91 to 2 = 13134.
>
> Total IN = 0.28729E+06
>
> OUT:
> ----
> STORAGE = 0.10823E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 2 to 15 = 6812.7
> Zone 2 to 31 = 0.20820E+06
> Zone 2 to 91 = 72274.
>
> Total OUT = 0.28729E+06
>
> IN - OUT = 0.58504E-02
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 3 at Time Step 1 of Stress Period 2
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.84894E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 81084.
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 31 to 3 = 0.0000
> Zone 91 to 3 = 1234.9
>
> Total IN = 82319.
>
> OUT:
> ----
> STORAGE = 0.0000
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 3 to 31 = 53937.
> Zone 3 to 91 = 28382.
>
> Total OUT = 82319.
>
> IN - OUT = 0.81732E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 1 at Time Step 1 of Stress Period 3
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.15770E-04
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.12898E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 16 to 1 = 0.0000
> Zone 31 to 1 = 0.0000
> Zone 40 to 1 = 0.0000
> Zone 91 to 1 = 0.0000
>
> Total IN = 0.12898E+06
>
> OUT:
> ----
> STORAGE = 0.38262E-02
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 1 to 16 = 399.88
> Zone 1 to 31 = 85214.
> Zone 1 to 40 = 12405.
> Zone 1 to 91 = 30958.
>
> Total OUT = 0.12898E+06
>
> IN - OUT = 0.88928E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 2 at Time Step 1 of Stress Period 3
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.0000
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.274165E+06
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 15 to 2 = 0.0000
> Zone 31 to 2 = 0.0000
> Zone 91 to 2 = 13215.
>
> Total IN = 0.28737E+06
>
> OUT:
> ----
> STORAGE = 0.27267E-02
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 2 to 15 = 6813.6
> Zone 2 to 31 = 0.20827E+06
> Zone 2 to 91 = 72291.
>
> Total OUT = 0.28737E+06
>
> IN - OUT = 0.69125E-03
>
> Percent Discrepancy = 0.00
> 1
> mark_zone
>
>
> Flow Budget for Zone 3 at Time Step 1 of Stress Period 3
> -------------------------------------------------------------
>
> Budget Term Flow (L**3/T)
> -----------------------------
>
> IN:
> ---
> STORAGE = 0.0000
> CONSTANT HEAD = 0.0000
> RECHARGE = 81084.
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 31 to 3 = 0.0000
> Zone 91 to 3 = 1262.7
>
> Total IN = 82346.
>
> OUT:
> ----
> STORAGE = 0.18113E-03
> CONSTANT HEAD = 0.0000
> RECHARGE = 0.0000
> STREAM LEAKAGE = 0.0000
> LAKE SEEPAGE = 0.0000
> UZF ET = 0.0000
> GW ET = 0.0000
> UZF INFILTR. = 0.0000
> SFR-DIV. INFLTR. = 0.0000
> UZF RECHARGE = 0.0000
> SURFACE LEAKAGE = 0.0000
> Zone 3 to 31 = 53843.
> Zone 3 to 91 = 28503.
>
> Total OUT = 82346.
>
> IN - OUT = -0.14018E-02
>
> Percent Discrepancy = 0.00
> ")
>
>
> g<-function(txt_con, string, from, to, ...) {
> L <- readLines(txt_con)
> matched <- grep(string, L, value = TRUE, ...)
> as.numeric(substring(matched, from, to))
> }
>
> #Now, strip out values
> edm<-g(txt_con, " RECHARGE =", 37, 50)
>
>
>
More information about the R-help
mailing list