[R] Help parsing from .txt
arun
smartpink111 at yahoo.com
Wed Oct 23 06:50:47 CEST 2013
Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")
res <- lapply(nm1,function(x) {
ln1 <- readLines(x)
indx1 <- grep("DATE PROCESSED",ln1)
indx2 <- grep("[A-Z]",ln1)
ln2 <- if(max(indx2)==indx1) ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
ln2 <- ln2[ln2!=""]
indx3 <- grepl("[A-Z]",ln2)
indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1))
mat1 <- do.call(cbind, split(ln2[!indx3],indx4))
colnames(mat1) <- ln2[indx3][-1]
write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})
A.K.
I have a number of .txt files (1,200) from which I need to parse a
number of pieces of information. The files are read into R as such:
TITLE
EXAMPLE
example 1
example 2
RELATED TITLE
related title 1
DATE PROCESSED
06/12/2011
Some of the files have examples 1-4, others 1-12 and beyond.
How can I create a script that will grab the information from
the different .txt files, put it in a matrix, and spit it out in a .csv
file with appropriately named columns (the column titles are in CAPS
above, where the information that will in the column is lower case).
Thanks in advance.
More information about the R-help
mailing list