[R] Parsing txt file
jim holtman
jholtman at gmail.com
Wed Nov 10 14:10:16 CET 2010
Here is a start:
> # read the input file
> input <- readLines('/tempxx.txt')
> # process the file starting at each "Book"
> result <- lapply(which(grepl("^Book", input)), function(.line){
+ contents <- NULL # initialize
+ name <- strsplit(input[.line], '\t')[[1]][2] # book name
+ # process succeeding lines as long as they are "CD"
+ while (grepl("^CD", input[.line + 1L])){
+ contents <- c(contents, strsplit(input[.line + 1L], '\t')[[1]][3])
+ .line <- .line + 1L
+ }
+ c(bookname = name, contents = paste(contents, collapse = ','))
+ })
>
> do.call(rbind, result)
bookname contents
[1,] " bioR " " chapter5"
[2,] " bioc++ " " workexamples, experiments"
[3,] " management tools " ""
>
On Wed, Nov 10, 2010 at 5:30 AM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:
> You could use the following to achieve your objective. To start with
>
> ?readLines
> ?strsplit
> ?for
> ?ifelse
>
> As you try, you may receive more specific answers for the issues you come up
> with.
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of karthicklakshman
> Sent: 10 November 2010 15:06
> To: r-help at r-project.org
> Subject: [R] Parsing txt file
>
>
> Hello,
>
> I have a tab limited text document with multiple lines as mentioned below,
>
>
>
> #FILE FORMAT
> #Book bookname author publisher pages
> #CD name content
> ############################################################################
> ########################
> ----------------------------------------------------------------------
> Book bioR xxx abc publishers 230
> CD biorexamples chapter5
> ----------------------------------------------------------------------
> Book bioc++ mmm tata publishers 400
> CD samples workexamples
> CD data experiments
> ----------------------------------------------------------------------
> Book management tools aaa some publishers 200
> ----------------------------------------------------------------------
>
>
> here the texts "book" and "CD" are present in each block.
>
> now, I am interested in creating a data frame with two columns, column
> names="bookname" and "content". Using "grep" it is possible to pick specific
> rows (grep("^book, finename")) but my expertise in programming is limited to
> create the mentioned data.frame.
>
> Note: the rowname "book" is present in all blocks but "CD" is variable (ie.,
> some block has two and some with no CD row, as shown above)
>
> please help me in creating something like this,
>
>
> bookname content
> [1] bioR chapter5
> [2] bioc++ workexamples, experiments
> [3] management tools NA
>
>
> Thanks in advance,
> karthick
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Parsing-txt-file-tp3035749p3035749.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list