[R] How to read in this data format?

Bart Joosen Bartjoosen at hotmail.com
Thu Mar 1 21:28:49 CET 2007


Dear All,

thanks for the replies, Jim Holtman has given a solution which fits my 
needs, but Gabor Grothendieck did the same thing,
but it looks like the coding will allow faster processing (should check this 
out tomorrow on a big datafile).

@gabor: I don't understand the use of the grep command:
        grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE)
What is this expression  ("^[1-9][0-9. ]*$|Time") actually doing?
I looked in the help page, but couldn't find a suitable answer.


Thanks to All


Bart

----- Original Message ----- 
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
To: "Bart Joosen" <bartjoosen at hotmail.com>
Cc: <r-help at stat.math.ethz.ch>
Sent: Thursday, March 01, 2007 6:35 PM
Subject: Re: [R] How to read in this data format?


> Read in the data using readLines, extract out
> all desired lines (namely those containing only
> numbers, dots and spaces or those with the
> word Time) and remove Retention from all
> lines so that all remaining lines have two
> fields.  Now that we have desired lines
> and all lines have two fields read them in
> using read.table.
>
> Finally, split them into groups and restructure
> them using "by" and in the last line we
> convert the "by" output to a data frame.
>
> At the end we display an alternate function f
> for use with by should we wish to generate long
> rather than wide output (using the terminology
> of the reshape command).
>
>
> Lines <- "$$ Experiment Number:
> $$ Associated Data:
>
> FUNCTION 1
>
> Scan            1
> Retention Time  0.017
>
> 399.8112        184
> 399.8742        0
> 399.9372        152
> ....
>
> Scan            2
> Retention Time  0.021
>
> 399.8112        181
> 399.8742        1
> 399.9372        153
> "
>
> # replace next line with: Lines. <- readLines("myfile.dat")
> Lines. <- readLines(textConnection(Lines))
> Lines. <- grep("^[1-9][0-9. ]*$|Time", Lines., value = TRUE)
> Lines. <- gsub("Retention", "", Lines.)
>
> DF <- read.table(textConnection(Lines.), as.is = TRUE)
> closeAllConnections()
>
> f <- function(x) c(id = x[1,2], structure(x[-1,2], .Names = x[-1,1]))
> out.by <- by(DF, cumsum(DF[,1] == "Time"), f)
> as.data.frame(do.call("rbind", out.by))
>
>
> We could alternately consider producing long
> format by replacing the function f with:
>
> f <- function(x) data.frame(x[-1,], id = x[1,2])
>
>
> On 3/1/07, Bart Joosen <bartjoosen at hotmail.com> wrote:
>> Hi,
>>
>> I recieved an ascii file, containing following information:
>>
>> $$ Experiment Number:
>> $$ Associated Data:
>>
>> FUNCTION 1
>>
>> Scan            1
>> Retention Time  0.017
>>
>> 399.8112        184
>> 399.8742        0
>> 399.9372        152
>> ....
>>
>> Scan            2
>> Retention Time  0.021
>>
>> 399.8112        181
>> 399.8742        1
>> 399.9372        153
>> .....
>>
>>
>> I would like to import this data in R into a dataframe, where there is a
>> column time, the first numbers as column names, and the second numbers as
>> data in the dataframe:
>>
>> Time    399.8112        399.8742        399.9372
>> 0.017   184     0       152
>> 0.021   181     1       153
>>
>> I did take a look at the read.table, read.delim, scan, ... But I 've no 
>> idea
>> about how to solve this problem.
>>
>> Anyone?
>>
>>
>> Thanks
>>
>> Bart
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list