[R] Fwd: Narrowing values collected from .txt file

jim holtman jholtman at gmail.com
Thu Aug 29 14:43:53 CEST 2013


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



---------- Forwarded message ----------
From: jim holtman <jholtman at gmail.com>
Date: Thu, Aug 29, 2013 at 8:43 AM
Subject: Re: [R] Narrowing values collected from .txt file
To: "Morway, Eric" <emorway at usgs.gov>


FYI, I duped your data to 100MB file and it took less that 10 seconds
to process.
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Aug 28, 2013 at 7:45 PM, Morway, Eric <emorway at usgs.gov> wrote:
> It looks as though the attachment to my last post didn't make the cut (or
> at least it's not appearing on the Nabble forum), for one reason or
> another.  I'm reattaching a smaller version so folks can run the code
> (won't work without a text file to operate on).  So, while the attached
> file is only a small sample of the larger file and will therefore run
> quickly, I would still be helpful if someone knows a more efficient
> approach to the code in the previous post.
>
>
> On Wed, Aug 28, 2013 at 11:28 AM,
>
>>
>> A relatively concise, commented, working solution to the problem
>> originally motivating this thread was found (below).  I suspect the
>> approach I've taken has a major inefficiency through the use of the
>> "scan" statement appearing inside the function "g".  The way the code
>> works right now, it has to re-open and read the file 'length(matched)
>> times' rather than sequentially reading through to the next pertinent
>> section of the txt file.  Does anyone have a more efficient approach in
>> mind so I don't have to wait 1/2 hour to get the results? (The only
>> adjustment to the code that follows is to point "txt" to wherever the
>> attached file is placed)
>>
>>
>> # where is the file?
>> txt<-"c:/temp/MCR_Budgets.txt"
>>
>> # Demarcation header
>> hdr_str<-"Flow Budget for Zone  2"
>>
>> # string to identify lines with desired values
>> srch_str<-"  RECHARGE ="
>>
>> # retrieves desired values
>> g<-function(txt_con, hdr_str, srch_str, from, to, ...) {
>>
>>     L <- readLines(txt_con)
>>
>>     #matched contains the line #s w/ hdr_str
>>     matched <- grep(hdr_str, L, value = FALSE, ...)
>>
>>     #initialize output list
>>     fetched_list<-numeric()
>>
>>     #for each instance of hdr_str, loop
>>     for(i in 1:(length(matched))){
>>
>>       #retrieve a section of text following each hdr_str
>>       snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42,
>> sep='\n')
>>
>>       #get data within the short section of retrieved text
>>       fetched <- grep(srch_str, snippet, value=TRUE)
>>
>>       #append output vector for plotting time series
>>       fetched_list <- c(fetched_list, as.numeric(substring(fetched, from,
>> to)))
>>
>>       #monitor
>>       print(i)
>>     }
>>
>>     #return desired values
>>     as.numeric(fetched_list)
>> }
>>
>> #The results of system.time reflect full 147 MB file,
>> # only half of which is attached.
>> system.time(
>>   rech_z2<-g(txt,hdr_str,srch_str,37,51)
>> )
>> #   user  system elapsed
>> #1740.48   36.08 1825.77
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list