[R] Parsing txt file

Wed Nov 10 14:10:16 CET 2010

Here is a start:

> # read the input file
> input <- readLines('/tempxx.txt')
> # process the file starting at each "Book"
> result <- lapply(which(grepl("^Book", input)), function(.line){
+     contents <- NULL  # initialize
+     name <- strsplit(input[.line], '\t')[[1]][2]  # book name
+     # process succeeding lines as long as they are "CD"
+     while (grepl("^CD", input[.line + 1L])){
+         contents <- c(contents, strsplit(input[.line + 1L], '\t')[[1]][3])
+         .line <- .line + 1L
+     }
+     c(bookname = name, contents = paste(contents, collapse = ','))
+ })
>
> do.call(rbind, result)
     bookname              contents
[1,] " bioR  "             "   chapter5"
[2,] "  bioc++ "           " workexamples,  experiments"
[3,] " management tools  " ""
>

On Wed, Nov 10, 2010 at 5:30 AM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:
> You could use the following to achieve your objective. To start with
>
> ?readLines
> ?strsplit
> ?for
> ?ifelse
>
> As you try, you may receive more specific answers for the issues you come up
> with.
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of karthicklakshman
> Sent: 10 November 2010 15:06
> To: r-help at r-project.org
> Subject: [R] Parsing txt file
>
>
> Hello,
>
> I have a tab limited text document with multiple lines as mentioned below,
>
>
>
> #FILE FORMAT
> #Book   bookname        author  publisher       pages
> #CD     name    content
> ############################################################################
> ########################
> ----------------------------------------------------------------------
> Book    bioR    xxx     abc publishers  230
> CD      biorexamples    chapter5
> ----------------------------------------------------------------------
> Book    bioc++  mmm     tata publishers 400
> CD      samples workexamples
> CD      data    experiments
> ----------------------------------------------------------------------
> Book    management tools        aaa     some publishers 200
> ----------------------------------------------------------------------
>
>
> here the texts "book" and "CD" are present in each block.
>
> now, I am interested in creating a data frame with two columns, column
> names="bookname" and "content". Using "grep" it is possible to pick specific
> rows (grep("^book, finename")) but my expertise in programming is limited to
> create the mentioned data.frame.
>
> Note: the rowname "book" is present in all blocks but "CD" is variable (ie.,
> some block has two and some with no CD row, as shown above)
>
> please help me in creating something like this,
>
>
>     bookname   content
> [1] bioR           chapter5
> [2] bioc++        workexamples, experiments
> [3] management tools   NA
>
>
> Thanks in advance,
> karthick
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Parsing-txt-file-tp3035749p3035749.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?