[R] Reading a file w/ two delimiters
David Winsemius
dwinsemius at comcast.net
Fri Nov 18 16:26:57 CET 2011
On Nov 18, 2011, at 9:28 AM, jim holtman wrote:
> It is pretty straightforward in R:
>
>> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|
>> zxcv|zxfcgv"))
>> closeAllConnections()
>> # convert tabs to newlines
>> x <- gsub("\t", "\n", x)
Did the rules get liberalized for escaping patterns? Or have I been
unnecessarily expending backslashes all these years. I thought that
one needed 3 blackslashes. This code does work and I am wondering if/
when I "didn't get the memo". (I do see that there is a line early in
the ?regex page that suggests I have been deluded all along.)
"The current implementation interprets \a as BEL, \e asESC, \f as FF,
\n as LF, \r as CR and \t as TAB."
> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|
zxcv|zxfcgv"))
> closeAllConnections()
> # convert tabs to newlines
> x2 <- gsub("\\\t", "\n", x)
> x2
[1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv"
So I guess my question is (now) why the triple-slash technique even
works?
--
David.
>> # write out to a temp file and then read in as a data frame
>> myFile <- tempfile()
>> writeLines(x, con = myFile)
>> x.df <- read.table(myFile, sep = "|")
>>
>>
>> x.df
> V1 V2 V3
> 1 sadf asdf asdf
> 2 qwer qwer qwer
> 3 zxcv zxcv zxfcgv
>>
>
> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
> <Jim.Langston at compuware.com> wrote:
>> Thanks Paul,
>>
>> That's the path I was marching down, I was hoping for something
>> a little cleaner, I do the same with Perl or Java.
>>
>> Jim
>>
>> On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiemstra at knmi.nl> wrote:
>>
>>> Hi Jim,
>>>
>>> You can read the text file using readLines. This puts each line in
>>> the
>>> file into an element of a list. Then you can go through the lines
>>> manually (e.g. using grep, sub, strsplit) and create your
>>> data.frame.
>>>
>>> cheers,
>>> Paul
>>>
>>> On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>>> Hi all,
>>>>
>>>> I've been scratching and poking, but basically, the file I need
>>>> to read
>>>> has
>>>> two delimiters that I need to contend with. The first is that the
>>>> file
>>>> contains
>>>> tabs (\t) , instead of newlines (\n), and the second is that the
>>>> fields
>>>> have
>>>> | for the seperators. I can easily do a read if I first convert
>>>> the \t
>>>> to
>>>> \n
>>>> and then use read.table to get the file read with the |
>>>> separator. But,
>>>> what I would really like to do, is do this all within R. I have a
>>>> lot of
>>>> files
>>>> to read and do analysis on.
>>>>
>>>> I can read the data into a table using the \t has delimiter, but
>>>> can't
>>>> figure
>>>> out how to take that table data and use the | for separation,
>>>> I've look
>>>> at
>>>> string splits, etc. but haven't figured out how to split the whole
>>>> table.
>>>>
>>>> Any thoughts ? hints ?
>>>>
>>>> Thanks,
>>>>
>>>> Jim
>>>>
>>>>
>>>> The contents of this e-mail are intended for the named a...
>>>> {{dropped:6}}
>>>>
>>>>
>> The contents of this e-mail are intended for the named addressee
>> only. It contains information that may be confidential. Unless you
>> are the named addressee or an authorized designee, you may not copy
>> or use it, or disclose it to anyone else. If you received it in
>> error please notify us immediately and then destroy it.
>>
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> --
>>> Paul Hiemstra, Ph.D.
>>> Global Climate Division
>>> Royal Netherlands Meteorological Institute (KNMI)
>>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>>> P.O. Box 201 | 3730 AE | De Bilt
>>> tel: +31 30 2206 494
>>>
>>> http://intamap.geo.uu.nl/~paul
>>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list