[R] using filter while Reading files -
jim holtman
jholtman at gmail.com
Thu Sep 17 02:15:17 CEST 2009
Here is one way to create a list of dataframe with the names in TABLE:
> x <- readLines('/tempxx.txt') # read in your data
> # assume that 'x' was read in with readLines
> input <- textConnection(x)
> # find the "TABLE" lines and use as the names of the dataframes to read
> indx <- c(grep("^TABLE", x), length(x) + 1) # add index for end of data
> indx.diff <- diff(indx) # sizes of each section
> # assume first line is a TABLE
> result <- list() # initialize output list of dataframes
> for (i in seq(length(indx.diff))){
+ df.name <- readLines(input, n=1) # read in the name
+ result[[df.name]] <- read.table(input, header=TRUE, nrows=indx.diff[i] - 2,
+ colClasses=rep('numeric', 6))
+ }
> close(input)
> result
$`TABLE NO. 1: Gold `
R1 T1 T2 T3 T4 T5
1 0 36.800 1410.0 4940.00 23.300 49.0000
2 43 37.787 2462.6 4442.27 23.139 48.4272
3 -1 36.787 1462.6 4442.27 23.139 48.4271
$`TABLE NO. 2: Silver `
R1 T1 T2 T3 T4 T5
1 0 36.800 1416.6 4540.00 28.900 49.0000
2 56 36.787 5462.6 4942.27 24.239 48.4272
3 -1 86.787 9462.6 4942.27 23.139 48.4271
On Wed, Sep 16, 2009 at 6:30 PM, Santosh <santosh2005 at gmail.com> wrote:
> Hi R'sians
> As the experts here suggested, I am using "scan" and "readLines" to read
> text files. I notice that read.table takes a long time read and process
> character vectors of 30000+ rows.
>
> How do I separate out the columns in the resulting character vector? The
> function "read.fwf" appears to be a bit cumbersome to use as number of
> columns in the text files is not constant, and some preprocessing to obtain
> number of columns is needed.
>
> Would really appreciate your ideas!!
>
> Below is the embedded data from the attached text file
> "TABLE NO. 1: Gold"
> " R1 T1 T2 T3 T4
> T5 "
> " 0 3.68000E+01 1.41000E+03 4.94000E+03 2.33000E+01
> 4.90000E+01"
> " 43 3.77870E+01 2.46260E+03 4.44227E+03 2.31390E+01
> 4.84272E+01"
> " -1 3.67870E+01 1.46260E+03 4.44227E+03 2.31390E+01
> 4.84271E+01"
> "TABLE NO. 2: Silver"
> " R1 T1 T2 T3 T4
> T5 "
> " 0 3.68000E+01 1.41660E+03 4.54000E+03 2.89000E+01
> 4.90000E+01"
> " 56 3.67870E+01 5.46260E+03 4.94227E+03 2.42390E+01
> 4.84272E+01"
> " -1 8.67870E+01 9.46260E+03 4.94227E+03 2.31390E+01
> 4.84271E+01"
>
> Thanks,
> Santosh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list