[R] regular expression

Uwe Ligges ligges at statistik.uni-dortmund.de
Sat Apr 7 13:18:44 CEST 2007



Laurent Rhelp wrote:
> Uwe Ligges a écrit :
> 
>>
>>
>> Laurent Rhelp wrote:
>>
>>> Dear R-List,
>>>
>>>      I have a great many files in a directory and I would like to 
>>> replace in every file the character " by the character ' and in the 
>>> same time, I have to change ' by '' (i.e. the character ' twice and 
>>> not the unique character ") when the character ' is embodied in "....."
>>>   So, "....." becomes '.....' and ".....'......" becomes '.....''......'
>>> Certainly, regular expression could help me but I am not able to use it.
>>>
>>> How can I do that with R ?
>>
>>
>>
>> In fact, you do not need to know anything about regular expressions in 
>> this case, since you are simply going to replace certain characters by 
>> others without any fuzzy restrictions:
>>
>> x <- "\".....'......\""
>> cat(x, "\n")
>> xn <- gsub('"', "'", gsub("'", "''", x))
>> cat(xn, "\n")
>>
>>
>> Uwe Ligges
>>
>>
>>> Thank you very much
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
> 
> Yes, You are right. So I wrote the code below (that I find a little 
> awkward but it works).
> 
> ##-----
> 
> dirdata <- getwd()
> fichnames <- list.files(path=paste(dirdata,"\\initial\\",sep=""))

see ?file.path to improve the above.


> for( i in 1:length(fichnames)){

see ?seq to improve the above: seq(along = fichnames)
Or even better, just work on the names (see below).

>      filein <- paste(dirdata,"\\initial\\",fichnames[i],sep="")

again, file.path() is your friend

>      conin <- file(filein)
>      open(conin)        
 >      nbrows <- length( readLines(conin,n=-1) )
>      close(conin)

You can simply use readLines() with the filename which open the 
connection to a file itself. And I do not see why you want to read the 
file here. Since your code becomes really complicated now, let me 
suggest the following procedure (untested!):

dirdata <- getwd()
fichnames <- list.files(file.path(dirdata, "initial"))
for(i in fichnames){
     temp <- readLines(file.path(dirdata, "initial", i))
     temp <- gsub('"', "'", gsub("'", "''", temp))
     writeLines(temp, con = file.path(dirdata, "result", i))
}

Uwe Ligges





>      fileout <- paste(dirdata,"\\result\\",fichnames[i],sep="")
>      conout <- file(fileout,"w")
> 
>      conin <- file(filein)
>      open(conin)
> 
> 
>      for( l in 1:nbrows )
>      {
>        text <- gsub('"',"'",gsub("'","''",readLines(conin,n=1)))
>        writeLines(con=conout,text=text)
>      }
> 
>      close(conin)
>      close(conout)
>  }
> 
> ##------



More information about the R-help mailing list