[R] data file import - numbers and letters in a matrix(!)

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Thu Apr 12 17:34:00 CEST 2007


Here is the contents of my "testdata.txt" :

-----------------------------------------------------
START OF HEIGHT DATA
S= 0    y=0.0 x=0.00000000
S= 0 y=0.1         x=0.00055643
  S= 9 y=4.9 x=1.67278117
   S= 9 y=5.0 x=1.74873257
S=10   y=0.0       x=0.00000000
     S=10    y=0.1 x=0.00075557
S=99 y=5.3    x=1.94719490
END OF HEIGHT DATA
-----------------------------------------------------

If you have access to a shell command, you can try changing the input 
file for read.delim using

cat testdata.txt | grep -v "^START" | grep -v "^END" | sed 's/ //g' | 
sed 's/S=//' | sed 's/y=/\t/' | sed 's/x=/\t/'

or here is my ugly fix in R

  my.read.file <- function(file=file){

   v1 <- readLines( con=file, n=-1)
   v2 <- v1[ - grep( "^START|^END", v1 ) ]
   v3 <- gsub(" ", "", v2)
   v4 <- gsub( "S=|y=|x=", " ", v3 )
   v5 <- gsub("^ ", "", v4)

   m  <- t( sapply( strsplit(v5, split=" "), as.numeric ) )
   colnames(m) <- c("S", "y", "x" )
   return(m)
  }

  my.read.file( "testdata.txt" )

Regards, Adai




Felix Wave wrote:
> Hello,
> I have a problem with the import of a date file. I seems verry tricky.
> I have a text file (end of the mail). Every file has a different number of measurments 
> witch start with "START OF HEIGHT DATA" and ende with "END OF HEIGHT DATA".
> 
> I imported the file in a matrix but the letters before the numbers are my problem 
> (S= ,S=,x=,y=).
> Because through the letters and the space after "S=" I got a different number
> of columns in my matrix and with letters in my matrix I can't count.
> 
> 
> My question. Is it possible to import the file to got 3 columns only with numbers and 
> no letters like x=, y=?
> 
> Thank's a lot
> Felix
> 
> 
> 
> 
> My R Code:
> ----------
> 
> # na.strings = "S="
> 
> Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip =   20, what = character() ), 5063, 3, byrow = TRUE)
> Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE)
> 
> 
> 
> My data file:
> -----------
> 
> FILEDATE:02.02.2007
> ...
> 
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
> ...
> S= 9 y=4.9 x=1.67278117
> S= 9 y=5.0 x=1.74873257
> S=10 y=0.0 x=0.00000000
> S=10 y=0.1 x=0.00075557
> ...
> S=99 y=5.3 x=1.94719490
> END OF HEIGHT DATA
> ...
> 
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
> 
> 
> 
> The imported matrix: 
>       [,1]           [,2]           [,3]           [,4]          
>  [6,] "S="           "9"            "y=4.9"        "x=1.67278117"
>  [7,] "S="           "9"            "y=5.0"        "x=1.74873257"
>  [8,] "S=10"         "y=0.0"        "x=0.00000000" "S=10"        
>  [9,] "y=0.1"        "x=0.00075557" "S=10"         "y=0.2"       
> [10,] "x=0.00277444" "S=10"         "y=0.3"        "x=0.00605958"
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>



More information about the R-help mailing list