[R] How to read in this data format?

Gabor Grothendieck ggrothendieck at gmail.com
Thu Mar 1 22:46:21 CET 2007


On 3/1/07, Bart Joosen <Bartjoosen at hotmail.com> wrote:
> Dear All,
>
> thanks for the replies, Jim Holtman has given a solution which fits my
> needs, but Gabor Grothendieck did the same thing,
> but it looks like the coding will allow faster processing (should check this
> out tomorrow on a big datafile).
>
> @gabor: I don't understand the use of the grep command:
>        grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE)
> What is this expression  ("^[1-9][0-9. ]*$|Time") actually doing?
> I looked in the help page, but couldn't find a suitable answer.

I briefly discussed it in the first paragraph of my response.  It
matches and returns only those lines that start (^ matches start of line)
with a digit, i.e. [1-9], and contains only digits, dots and spaces,
i.e. [0-9. ]*, to end of line, i.e. $ matches end of line, or (| means
or) contains the word Time.
If you don't have lines like ... (which you did in your example) then
the regexp
could be simplified to "^[0-9. ]+$|Time".  You may need to match tabs too
if your input contains those.

>
>
> Thanks to All
>
>
> Bart
>
> ----- Original Message -----
> From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
> To: "Bart Joosen" <bartjoosen at hotmail.com>
> Cc: <r-help at stat.math.ethz.ch>
> Sent: Thursday, March 01, 2007 6:35 PM
> Subject: Re: [R] How to read in this data format?
>
>
> > Read in the data using readLines, extract out
> > all desired lines (namely those containing only
> > numbers, dots and spaces or those with the
> > word Time) and remove Retention from all
> > lines so that all remaining lines have two
> > fields.  Now that we have desired lines
> > and all lines have two fields read them in
> > using read.table.
> >
> > Finally, split them into groups and restructure
> > them using "by" and in the last line we
> > convert the "by" output to a data frame.
> >
> > At the end we display an alternate function f
> > for use with by should we wish to generate long
> > rather than wide output (using the terminology
> > of the reshape command).
> >
> >
> > Lines <- "$$ Experiment Number:
> > $$ Associated Data:
> >
> > FUNCTION 1
> >
> > Scan            1
> > Retention Time  0.017
> >
> > 399.8112        184
> > 399.8742        0
> > 399.9372        152
> > ....
> >
> > Scan            2
> > Retention Time  0.021
> >
> > 399.8112        181
> > 399.8742        1
> > 399.9372        153
> > "
> >
> > # replace next line with: Lines. <- readLines("myfile.dat")
> > Lines. <- readLines(textConnection(Lines))
> > Lines. <- grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE)
> > Lines. <- gsub("Retention", "", Lines.)
> >
> > DF <- read.table(textConnection(Lines.), as.is = TRUE)
> > closeAllConnections()
> >
> > f <- function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
> > out.by <- by(DF, cumsum(DF[,1] == "Time"), f)
> > as.data.frame(do.call("rbind", out.by))
> >
> >
> > We could alternately consider producing long
> > format by replacing the function f with:
> >
> > f <- function(x) data.frame(x[-1,], id = x[1,2])
> >
> >
> > On 3/1/07, Bart Joosen <bartjoosen at hotmail.com> wrote:
> >> Hi,
> >>
> >> I recieved an ascii file, containing following information:
> >>
> >> $$ Experiment Number:
> >> $$ Associated Data:
> >>
> >> FUNCTION 1
> >>
> >> Scan            1
> >> Retention Time  0.017
> >>
> >> 399.8112        184
> >> 399.8742        0
> >> 399.9372        152
> >> ....
> >>
> >> Scan            2
> >> Retention Time  0.021
> >>
> >> 399.8112        181
> >> 399.8742        1
> >> 399.9372        153
> >> .....
> >>
> >>
> >> I would like to import this data in R into a dataframe, where there is a
> >> column time, the first numbers as column names, and the second numbers as
> >> data in the dataframe:
> >>
> >> Time    399.8112        399.8742        399.9372
> >> 0.017   184     0       152
> >> 0.021   181     1       153
> >>
> >> I did take a look at the read.table, read.delim, scan, ... But I 've no
> >> idea
> >> about how to solve this problem.
> >>
> >> Anyone?
> >>
> >>
> >> Thanks
> >>
> >> Bart
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
>



More information about the R-help mailing list