[R] Reading data from a text file conditionally skipping lines

arun smartpink111 at yahoo.com
Thu Apr 25 23:30:29 CEST 2013


Hi,
It would be better to give an example.
If your dataset is like the one attached:
con<-file("Trial1.txt")
 Lines1<- readLines(con)
 close(con)
#If the data you wanted to extract is numeric and the header and footer are characters,
dat1<-read.table(text=Lines1[-grep("[A-Za-z]",Lines1)],sep="\t",header=FALSE)
dat1
#   V1 V2 V3 V4 V5
#1  38 43 39 44 45
#2  39 44 36 49 46
#3  42 45 47 49 37
#4  34 43 39 45 45
#5  38 42 39 44 47
#6  43 44 46 42 37
#7  32 49 38 42 45
#8  34 45 35 49 46
#9  44 45 46 49 37
#10 34 43 39 48 49
#11 38 42 39 47 47
#12 43 44 46 42 37
#13 37 43 39 44 45
#14 39 42 36 49 46
#15 42 45 47 49 37

#or
You mentioned that the data is repeated "every so many lines".  Here also, there is repeating pattern.                                 

head(Lines1,10)
 #[1] "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat #volutpat. "                                    
 #[2] "Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit #lobortis"                                                                                               
# [3] "38\t43\t39\t44\t45"                                                                                                                                                                   
 #[4] "39\t44\t36\t49\t46"                                                                                                                                                                   
 #[5] "42\t45\t47\t49\t37"                                                                                                                                                                   
 #[6] "Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie #consequat."                                                                                             
 #[7] "Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis #dolore te feugait nulla facilisi."
 #[8] "34\t43\t39\t45\t45"                                                                                                                                                                   
 #[9] "38\t42\t39\t44\t47"                                                                                                                                                                   
#[10] "43\t44\t46\t42\t37"      



dat2<-read.table(text=Lines1[rep(rep(c(FALSE,TRUE),times=c(2,3)),5)],sep="\t",header=FALSE)
 identical(dat1,dat2)
#[1] TRUE

A.K.





>I have a text file that is nicely formatted (tab separated). However, it has some header and footer information after every so many lines.  I do not >want to read this information in my dataframe.  What is the best 
way to read this data into R.  Thanks for all the help! 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Trial1.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130425/198e6c0c/attachment.txt>


More information about the R-help mailing list