[R] Reading a file w/ two delimiters

jim holtman jholtman at gmail.com
Fri Nov 18 16:03:30 CET 2011


The thing to watch out for is if you file is large, 'textConnection'
is very slow at providing the data stream for something like
read.table.  It is usually much faster to read in the file with
'readLines', preprocess the data data, write it out to a tempfile and
then read it back in with 'read.table'.

On Fri, Nov 18, 2011 at 9:52 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Nov 18, 2011, at 9:13 AM, Langston, Jim wrote:
>
>> Thanks Paul,
>>
>> That's the path I was marching down, I was hoping for something
>> a little cleaner, I do the same with Perl or Java.
>
>> tesfil <- "aa|bb|cc\tdd|ee|ff\t"
>
>> read.table(textConnection(gsub("\\\t", "\n", scan(
>               textConnection(tesfil), # substitute your file here
>               what="character")) ), sep="|")
> Read 2 items
>  V1 V2 V3
> 1 aa bb cc
> 2 dd ee ff
>
>>
>> Jim
>>
>> On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiemstra at knmi.nl> wrote:
>>
>>> Hi Jim,
>>>
>>> You can read the text file using readLines. This puts each line in the
>>> file into an element of a list. Then you can go through the lines
>>> manually (e.g. using grep, sub, strsplit) and create your data.frame.
>>>
>>> cheers,
>>> Paul
>>>
>>> On 11/18/2011 12:37 PM, Langston, Jim wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I've been scratching and poking, but basically, the file I need to read
>>>> has
>>>> two delimiters that I need to contend with. The first is that the file
>>>> contains
>>>> tabs (\t) , instead of newlines (\n), and the second is that the fields
>>>> have
>>>> | for the seperators. I can easily do a read if I first convert the \t
>>>> to
>>>> \n
>>>> and then use read.table to get the file read with the | separator. But,
>>>> what I would really like to do, is do this all within R. I have a lot of
>>>> files
>>>> to read and do analysis on.
>>>>
>>>> I can read the data into a table using the \t has delimiter, but can't
>>>> figure
>>>> out how to take that table data and use the | for separation, I've look
>>>> at
>>>> string splits, etc. but haven't figured out how to split the whole
>>>> table.
>>>>
>>>> Any thoughts ? hints ?
>>>>
>>>> Thanks,
>>>>
>>>> Jim
>>>>
>>>>
>>>> The contents of this e-mail are intended for the named a...{{dropped:6}}
>>>>
>>>>
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or disclose
>> it to anyone else. If you received it in error please notify us immediately
>> and then destroy it.
>>
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> --
>>> Paul Hiemstra, Ph.D.
>>> Global Climate Division
>>> Royal Netherlands Meteorological Institute (KNMI)
>>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
>>> P.O. Box 201 | 3730 AE | De Bilt
>>> tel: +31 30 2206 494
>>>
>>> http://intamap.geo.uu.nl/~paul
>>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list