[R] gsub with regular expression

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jun 25 17:11:21 CEST 2010

On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk
<residuo.solow at gmail.com> wrote:
> If I have a text with 7 words per line and I would like to put first
> and second word joined in a vector and the rest of words one per
> column in a matrix how can I do it?
> First 2 lines of my text file:
> "2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido"
> "2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca"
> Results:
> Vector:
> 2008/12/31 12:23:31
> 2010/02/01 02:35:31
> Matrix
> "numero" 343.233.233 "Rodeo"   "Vaca"   "Ruido"
> "palabra" 111.111.222 "abejorro" "Rodeo" "Vaca"

Here are two solutions.  Both solutions are three statements long
(read in the data, display the vector, display the matrix).  Replace
textConnection(text) with "myfile.dat", say, in each.

1. Here is a sub solution:

L <- readLines(textConnection(Lines))
sub("(\\S+ \\S+) .*", "\\1", L)
sub("\\S+ \\S+ ", "", L)

2. Here is a solution using zoo:

Lines <- "2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca"


z <- read.zoo(textConnection(Lines), index = 1:2,
           FUN = function(x) paste(x[,1], x[,2]))

time(z) # the vector
coredata(z) # the matrix

Another possibility would be to convert to chron or POSIXct at the
same time as reading it in:

# chron
z <- read.zoo(textConnection(Lines), index = 1:2,
 FUN = function(x) as.chron(paste(x[,1], x[,2]), format = "%Y/%m/%d %H:%M:%S"))

z <- read.zoo(textConnection(Lines), index = 1:2,
 FUN = function(x) as.POSIXct(paste(x[,1], x[,2]), format = "%Y/%m/%d

More information about the R-help mailing list