[R] Reading a file w/ two delimiters

jim holtman jholtman at gmail.com
Fri Nov 18 15:28:47 CET 2011


It is pretty straightforward in R:

> x <- readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv"))
> closeAllConnections()
> # convert tabs to newlines
> x <- gsub("\t", "\n", x)
> # write out to a temp file and then read in as a data frame
> myFile <- tempfile()
> writeLines(x, con = myFile)
> x.df <- read.table(myFile, sep = "|")
>
>
> x.df
    V1   V2     V3
1 sadf asdf   asdf
2 qwer qwer   qwer
3 zxcv zxcv zxfcgv
>

On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim
<Jim.Langston at compuware.com> wrote:
> Thanks Paul,
>
> That's the path I was marching down, I was hoping for something
> a little cleaner, I do the same with Perl or Java.
>
> Jim
>
> On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiemstra at knmi.nl> wrote:
>
>>Hi Jim,
>>
>>You can read the text file using readLines. This puts each line in the
>>file into an element of a list. Then you can go through the lines
>>manually (e.g. using grep, sub, strsplit) and create your data.frame.
>>
>>cheers,
>>Paul
>>
>>On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>> Hi all,
>>>
>>> I've been scratching and poking, but basically, the file I need to read
>>>has
>>> two delimiters that I need to contend with. The first is that the file
>>> contains
>>> tabs (\t) , instead of newlines (\n), and the second is that the fields
>>> have
>>> | for the seperators. I can easily do a read if I first convert the \t
>>>to
>>> \n
>>> and then use read.table to get the file read with the | separator. But,
>>> what I would really like to do, is do this all within R. I have a lot of
>>> files
>>> to read and do analysis on.
>>>
>>> I can read the data into a table using the \t has delimiter, but can't
>>> figure
>>> out how to take that table data and use the | for separation, I've look
>>>at
>>> string splits, etc. but haven't figured out how to split the whole
>>>table.
>>>
>>> Any thoughts ? hints ?
>>>
>>> Thanks,
>>>
>>> Jim
>>>
>>>
>>> The contents of this e-mail are intended for the named a...{{dropped:6}}
>>>
>>>
> The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it.
>
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>--
>>Paul Hiemstra, Ph.D.
>>Global Climate Division
>>Royal Netherlands Meteorological Institute (KNMI)
>>Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>>P.O. Box 201 | 3730 AE | De Bilt
>>tel: +31 30 2206 494
>>
>>http://intamap.geo.uu.nl/~paul
>>http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list