[R] Reading a file w/ two delimiters

David Winsemius dwinsemius at comcast.net
Fri Nov 18 16:26:57 CET 2011


On Nov 18, 2011, at 9:28 AM, jim holtman wrote:

> It is pretty straightforward in R:
>
>> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv| 
>> zxcv|zxfcgv"))
>> closeAllConnections()
>> # convert tabs to newlines
>> x <- gsub("\t", "\n", x)

Did the rules get liberalized for escaping patterns? Or have I been  
unnecessarily expending backslashes all these years. I thought that  
one needed 3 blackslashes. This code does work and I am wondering if/ 
when I "didn't get the memo". (I do see that there is a line early in  
the ?regex page that suggests I have been deluded all along.)

"The current implementation interprets \a as BEL, \e asESC, \f as FF,  
\n as LF, \r as CR and \t as TAB."

 > x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv| 
zxcv|zxfcgv"))
 > closeAllConnections()
 > # convert tabs to newlines
 > x2 <- gsub("\\\t", "\n", x)
 > x2
[1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"

So I guess my question is (now) why the triple-slash technique even  
works?

-- 
David.



>> # write out to a temp file and then read in as a data frame
>> myFile <- tempfile()
>> writeLines(x, con = myFile)
>> x.df <- read.table(myFile, sep = "|")
>>
>>
>> x.df
>    V1   V2     V3
> 1 sadf asdf   asdf
> 2 qwer qwer   qwer
> 3 zxcv zxcv zxfcgv
>>
>
> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
> <Jim.Langston at compuware.com> wrote:
>> Thanks Paul,
>>
>> That's the path I was marching down, I was hoping for something
>> a little cleaner, I do the same with Perl or Java.
>>
>> Jim
>>
>> On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiemstra at knmi.nl> wrote:
>>
>>> Hi Jim,
>>>
>>> You can read the text file using readLines. This puts each line in  
>>> the
>>> file into an element of a list. Then you can go through the lines
>>> manually (e.g. using grep, sub, strsplit) and create your  
>>> data.frame.
>>>
>>> cheers,
>>> Paul
>>>
>>> On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>>> Hi all,
>>>>
>>>> I've been scratching and poking, but basically, the file I need  
>>>> to read
>>>> has
>>>> two delimiters that I need to contend with. The first is that the  
>>>> file
>>>> contains
>>>> tabs (\t) , instead of newlines (\n), and the second is that the  
>>>> fields
>>>> have
>>>> | for the seperators. I can easily do a read if I first convert  
>>>> the \t
>>>> to
>>>> \n
>>>> and then use read.table to get the file read with the |  
>>>> separator. But,
>>>> what I would really like to do, is do this all within R. I have a  
>>>> lot of
>>>> files
>>>> to read and do analysis on.
>>>>
>>>> I can read the data into a table using the \t has delimiter, but  
>>>> can't
>>>> figure
>>>> out how to take that table data and use the | for separation,  
>>>> I've look
>>>> at
>>>> string splits, etc. but haven't figured out how to split the whole
>>>> table.
>>>>
>>>> Any thoughts ? hints ?
>>>>
>>>> Thanks,
>>>>
>>>> Jim
>>>>
>>>>
>>>> The contents of this e-mail are intended for the named a... 
>>>> {{dropped:6}}
>>>>
>>>>
>> The contents of this e-mail are intended for the named addressee  
>> only. It contains information that may be confidential. Unless you  
>> are the named addressee or an authorized designee, you may not copy  
>> or use it, or disclose it to anyone else. If you received it in  
>> error please notify us immediately and then destroy it.
>>
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> --
>>> Paul Hiemstra, Ph.D.
>>> Global Climate Division
>>> Royal Netherlands Meteorological Institute (KNMI)
>>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>>> P.O. Box 201 | 3730 AE | De Bilt
>>> tel: +31 30 2206 494
>>>
>>> http://intamap.geo.uu.nl/~paul
>>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list