Try this:
> cat(c("[ID: 001 ] [Writer: Steven Moffat ] [Rating: 8.9 ] Doctor Who",
+ "[ID: 002 ] [Writer: Joss Whedon ] [Rating: 8.8 ] Buffy",
+ "[ID: 003 ] [Writer: J. Michael Straczynski ] [Rating: 7.4
]Babylon"),
+ sep = "\n", file = "tmp.txt")
>
> # read in the data and parse it assuming it has the same structure
> input <- readLines('tmp.txt')
> # parse it item by item
> x.id <- sub(".*\\[ID: ([[:digit:]]+).*", "\\1 ", input)
> x.writer <- sub(".*\\[Writer:([^]]+).*", '\\1', input)
> x.rating <- sub(".*\\[Rating: ([0-9.]+).*", '\\1', input)
> x.prog <- sub(".*\\](.*)", '\\1', input)
> #create dataframe
> data.frame(id=x.id, writer=x.writer, rating=x.rating, prog=x.prog)
id writer rating prog
1 001 Steven Moffat 8.9 Doctor Who
2 002 Joss Whedon 8.8 Buffy
3 003 J. Michael Straczynski 7.4 Babylon
>
On Thu, May 6, 2010 at 9:58 AM, Tony B wrote:
> Dear all
>
> Lets say I have a plain text file as follows:
>
> > cat(c("[ID: 001 ] [Writer: Steven Moffat ] [Rating: 8.9 ] Doctor Who",
> + "[ID: 002 ] [Writer: Joss Whedon ] [Rating: 8.8 ] Buffy",
> + "[ID: 003 ] [Writer: J. Michael Straczynski ] [Rating: 7.4 ]
> Babylon [5]"),
> + sep = "\n", file = "tmp.txt")
>
> I would somehow like to read in this file to R and covert it into a
> data frame like this:
>
> > DF <- data.frame(ID = c("001", "002", "003"),
> + Writer = c("Steven Moffat", "Joss Whedon", "J.
> Michael Straczynski"),
> + Rating = c("8.9", "8.8", "7.4"),
> + Text = c("Doctor Who", "Buffy", "Babylon [5]"),
> stringsAsFactors = FALSE)
>
>
> My initial thoughts were to use readLines on the text file and maybe
> do some regular expressions and also use strsplit(..); but having
> confused myself after several attempts I was wondering if there is a
> way, perhaps using maybe read.table instead? My end goal is to
> hopefully convert DF into an XML structure.
>
> Thank you kindly in advance for your time,
> Tony Breyal
>
> # Windows Vista
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> LC_NUMERIC=C LC_TIME=English_United Kingdom.
> 1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] XML_2.8-1
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
[[alternative HTML version deleted]]