[R] iterators : checkFunc with ireadLines
Laurent Rhelp
L@urentRHe|p @end|ng |rom |ree@|r
Mon May 18 09:05:58 CEST 2020
Dear William,
Thank you for your answer
My file is very large so I cannot read it in my memory (I cannot use
read.table). So I want to put in memory only the line I need to process.
With readLines, as I did, it works but I would like to use an iterator
and a foreach loop to understand this way to do because I thought that
it was a better solution to write a nice code.
Le 18/05/2020 à 04:54, William Michels a écrit :
> Apologies, Laurent, for this two-part answer. I misunderstood your
> post where you stated you wanted to "filter(ing) some
> selected lines according to the line name... ." I thought that meant
> you had a separate index (like a series of primes) that you wanted to
> use to only read-in selected line numbers from a file (test file below
> with numbers 1:1000 each on a separate line):
>
>> library(gmp)
>> library(iterators)
>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 2
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 3
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 5
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 7
> However, what it really seems that you want to do is read each line of
> a (possibly enormous) file, test each line "string-wise" to keep or
> discard, and if you're keeping it, append the line to a list. I can
> certainly see the advantage of this strategy for reading in very, very
> large files, but it's not clear to me how the "ireadLines" function (
> in the "iterators" package) will help you, since it doesn't seem to
> generate anything but a sequential index.
>
> Anyway, below is an absolutely standard read-in of your data using
> read.table(). Hopefully some of the code I've posted has been useful
> to you.
>
>> sensors <- c("N053", "N163")
>> read.table("test2.txt")
> V1 V2 V3 V4 V5 V6 V7
> V8 V9 V10
> 1 Time 0.000000 0.000999 0.001999 0.002998 0.003998 0.004997
> 0.005997 0.006996 0.007996
> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
> -0.033690 -0.041067 -0.038747
> 3 N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094 -0.012104
> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
> -0.015089 -0.014439 -0.011681
> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> -0.036061 -0.044516 -0.046436
> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
> -0.021996 -0.021755 -0.021846
>> Laurent_data <- read.table("test2.txt")
>> Laurent_data[Laurent_data$V1 %in% sensors, ]
> V1 V2 V3 V4 V5 V6 V7
> V8 V9 V10
> 3 N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094 -0.012104
> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> -0.036061 -0.044516 -0.046436
>
> Best, Bill.
>
> W. Michels, Ph.D.
>
>
> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <LaurentRHelp using free.fr> wrote:
>> Dear R-Help List,
>>
>> I would like to use an iterator to read a file filtering some
>> selected lines according to the line name in order to use after a
>> foreach loop. I wanted to use the checkFunc argument as the following
>> example found on internet to select only prime numbers :
>>
>> | iprime <- ||iter||(1:100, checkFunc =
>> ||function||(n) ||isprime||(n))|
>>
>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>
>> but the checkFunc argument seems not to be available with the function
>> ireadLines (package iterators). So, I did the code below to solve my
>> problem but I am sure that I miss something to use iterators with files.
>> Since I found nothing on the web about ireadLines and the checkFunc
>> argument, could somebody help me to understand how we have to use
>> iterator (and foreach loop) on files keeping only selected lines ?
>>
>> Thank you very much
>> Laurent
>>
>> Presently here is my code:
>>
>> ## mock file to read: test.txt
>> ##
>> # Time 0 0.000999 0.001999 0.002998 0.003998 0.004997
>> 0.005997 0.006996 0.007996
>> # N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464
>> -0.026816 -0.03369 -0.041067 -0.038747
>> # N053 -0.014083 -0.004741 0.001443 -0.010152 -0.012996
>> -0.005337 -0.008738 -0.015094 -0.012104
>> # N123 -0.019008 -0.013494 -0.01318 -0.029208 -0.032748
>> -0.020243 -0.015089 -0.014439 -0.011681
>> # N163 -0.054023 -0.049345 -0.037158 -0.04112 -0.044612
>> -0.036953 -0.036061 -0.044516 -0.046436
>> # N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569
>> -0.021827 -0.021996 -0.021755 -0.021846
>>
>>
>> # sensors to keep
>>
>> sensors <- c("N053", "N163")
>>
>>
>> library(iterators)
>>
>> library(rlist)
>>
>>
>> file_name <- "test.txt"
>>
>> con_obj <- file( file_name , "r")
>> ifile <- ireadLines( con_obj , n = 1 )
>>
>>
>> ## I do not do a loop for the example
>>
>> res <- list()
>>
>> r <- get_Lines_iter( ifile , sensors)
>> res <- list.append( res , r )
>> res
>> r <- get_Lines_iter( ifile , sensors)
>> res <- list.append( res , r )
>> res
>> r <- get_Lines_iter( ifile , sensors)
>> do.call("cbind",res)
>>
>> ## the function get_Lines_iter to select and process the line
>>
>> get_Lines_iter <- function( iter , sensors, sep = '\t', quiet = FALSE){
>> ## read the next record in the iterator
>> r = try( nextElem(iter) )
>> while( TRUE ){
>> if( class(r) == "try-error") {
>> return( stop("The iterator is empty") )
>> } else {
>> ## split the read line according to the separator
>> r_txt <- textConnection(r)
>> fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
>> quiet)
>> ## test if we have to keep the line
>> if( fields[1] %in% sensors){
>> ## data processing for the selected line (for the example
>> transformation in dataframe)
>> n <- length(fields)
>> x <- data.frame( as.numeric(fields[2:n]) )
>> names(x) <- fields[1]
>> ## We return the values
>> print(paste0("sensor ",fields[1]," ok"))
>> return( x )
>> }else{
>> print(paste0("Sensor ", fields[1] ," not selected"))
>> r = try(nextElem(iter) )}
>> }
>> }# end while loop
>> }
>>
>>
>>
>>
>>
>>
>>
>> --
>> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
More information about the R-help
mailing list