[R] data file import - numbers and letters in a matrix(!)

Gabor Grothendieck ggrothendieck at gmail.com
Thu Apr 12 16:19:32 CEST 2007


Try pasting this into an R session:


Lines.raw <- "FILEDATE:02.02.2007
...

START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
...
S= 9 y=4.9 x=1.67278117
S= 9 y=5.0 x=1.74873257
S=10 y=0.0 x=0.00000000
S=10 y=0.1 x=0.00075557
...
S=99 y=5.3 x=1.94719490
END OF HEIGHT DATA
...

START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
"

# next line would be replaced by
#  somthing like: Lines <- readLines("myfile.dat")
Lines <- readLines(textConnection(Lines.raw))

# extract those lines that contain an =
Lines <- grep("=", Lines, value = TRUE)

# get col names by removing all but letters & spaces from line 1
cn <- gsub("[^a-zA-Z ]", "", Lines[1])
cn <- scan(textConnection(cn), what = "")

# remove anything that is not a number, dot or space and read in
Lines <- gsub("[^ .0-9]", "", Lines)
DF <- read.table(textConnection(Lines), col.names = cn)
closeAllConnections()
DF




On 4/12/07, Felix Wave <felix-wave at vr-web.de> wrote:
> Hello,
> I have a problem with the import of a date file. I seems verry tricky.
> I have a text file (end of the mail). Every file has a different number of measurments
> witch start with "START OF HEIGHT DATA" and ende with "END OF HEIGHT DATA".
>
> I imported the file in a matrix but the letters before the numbers are my problem
> (S= ,S=,x=,y=).
> Because through the letters and the space after "S=" I got a different number
> of columns in my matrix and with letters in my matrix I can't count.
>
>
> My question. Is it possible to import the file to got 3 columns only with numbers and
> no letters like x=, y=?
>
> Thank's a lot
> Felix
>
>
>
>
> My R Code:
> ----------
>
> # na.strings = "S="
>
> Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip =   20, what = character() ), 5063, 3, byrow = TRUE)
> Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE)
>
>
>
> My data file:
> -----------
>
> FILEDATE:02.02.2007
> ...
>
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
> ...
> S= 9 y=4.9 x=1.67278117
> S= 9 y=5.0 x=1.74873257
> S=10 y=0.0 x=0.00000000
> S=10 y=0.1 x=0.00075557
> ...
> S=99 y=5.3 x=1.94719490
> END OF HEIGHT DATA
> ...
>
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
>
>
>
> The imported matrix:
> >
>      [,1]           [,2]           [,3]           [,4]
>  [6,] "S="           "9"            "y=4.9"        "x=1.67278117"
>  [7,] "S="           "9"            "y=5.0"        "x=1.74873257"
>  [8,] "S=10"         "y=0.0"        "x=0.00000000" "S=10"
>  [9,] "y=0.1"        "x=0.00075557" "S=10"         "y=0.2"
> [10,] "x=0.00277444" "S=10"         "y=0.3"        "x=0.00605958"
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list