[R] Reading recurring data in a text file

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Jul 24 21:18:56 CEST 2019


Hello,

This is far from a complete answer.

A quicky one: no loops.

mc_list2 <- grep(srchStr1, lines)
tmp_list2 <- grep(srchStr2, lines)

identical(mc_list, mc_list2)    # [1] TRUE
identical(tmp_list, tmp_list2)  # [1] TRUE


Another one: don't extend lists or vectors inside loops, reserve memory 
beforehand.

wc <- vector("list", length = length(mc_list))
tmp <- vector("list", length = length(tmp_list))


are much better than your

wc <- list()
tmp <- list()


Maybe I will find ways to save time with the really slow instructions.

Hope this helps,

Rui Barradas


Às 19:54 de 24/07/19, Morway, Eric via R-help escreveu:
> The small reproducible example below works, but is way too slow on the real
> problem.  The real problem is attempting to extract ~2920 repeated arrays
> from a 60 Mb file and takes ~80 minutes.  I'm wondering how I might
> re-engineer the script to avoid opening and closing the file 2920 times as
> is the case now.  That is, is there a way to keep the file open and peel
> out the arrays and stuff them into a list of data.tables, as is done in the
> small reproducible example below, but in a significantly faster way?
> 
> wha <- "     INITIAL PRESSURE HEAD
>       INITIAL TEMPERATURE SET TO 4.000E+00 DEGREES C
>       VS2DH - MedSand for TL test
> 
>       TOTAL ELAPSED TIME =  0.000000E+00 sec
>       TIME STEP         0
> 
>       MOISTURE CONTENT
>    Z, IN
>    m                       X OR R DISTANCE, IN m
>                  0.500
>       0.075     0.1475
>       0.225     0.1475
>       0.375     0.1475
>       0.525     0.1475
>       0.675     0.1475
> blah
> blah
> blah
>       TEMPERATURE, IN DECREES C
>    Z, IN
>    m                       X OR R DISTANCE, IN m
>                  0.500
>       0.075     1.1475
>       0.225     2.1475
>       0.375     3.1475
>       0.525     4.1475
>       0.675     5.1475
> blah
> blah
> blah
> 
>       TOTAL ELAPSED TIME =  8.6400E+04 sec
>       TIME STEP         0
> 
>       MOISTURE CONTENT
>    Z, IN
>    m                       X OR R DISTANCE, IN m
>                  0.500
>       0.075     0.1875
>       0.225     0.1775
>       0.375     0.1575
>       0.525     0.1675
>       0.675     0.1475
> blah
> blah
> blah     TEMPERATURE, IN DECREES C
>    Z, IN
>    m                       X OR R DISTANCE, IN m
>                  0.500
>       0.075     1.1475
>       0.225     2.1475
>       0.375     3.1475
>       0.525     4.1475
>       0.675     5.1475
> blah
> blah
> blah"
> 
> example_content <- textConnection(wha)
> 
> srchStr1 <- '     MOISTURE CONTENT'
> srchStr2 <- 'TEMPERATURE, IN DECREES C'
> 
> lines   <- readLines(example_content)
> mc_list <- NULL
> for (i in 1:length(lines)){
>    # Look for start of water content
>    if(grepl(srchStr1, lines[i])){
>      mc_list <- c(mc_list, i)
>    }
> }
> 
> tmp_list <- NULL
> for (i in 1:length(lines)){
>    # Look for start of temperature data
>    if(grepl(srchStr2, lines[i])){
>      tmp_list <- c(tmp_list, i)
>    }
> }
> 
> # Store the water content arrays
> wc <- list()
> # Read all the moisture content profiles
> for(i in 1:length(mc_list)){
>    lineNum <- mc_list[i] + 3
>    mct <- read.table(text = wha, skip=lineNum, nrows=5,
>                      col.names=c('depth','wc'))
>    wc[[i]] <- mct
> }
> 
> # Store the water temperature arrays
> tmp <- list()
> # Read all the temperature profiles
> for(i in 1:length(tmp_list)){
>    lineNum <- tmp_list[i] + 3
>    tmpt <- read.table(text = wha, skip=lineNum, nrows=5,
>                      col.names=c('depth','tmp'))
>    tmp[[i]] <- tmpt
> }
> 
> # quick inspection
> length(wc)
> wc[[1]]
> # Looks like what I'm after, but too slow in real world problem
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list