[R] Import fixed-format ascii file with mixed record types

David Winsemius dwinsemius at comcast.net
Tue Feb 2 20:21:11 CET 2010


On Feb 2, 2010, at 1:33 PM, trece por ciento wrote:

> Thanks again, David
> I think that this could work.
> Final questions:
> 1. I have read that read.fwt could be slow for big tables (my tables  
> have aprox. 160000 records, with 176 characters of recordlenght,  
> almost 28MBytes). Could that be a problem?

I don't know. Too many details are missing.

> 2. If using read.fwt is not a problem, wouldn't be better to read  
> all the records by read.fwt into a dataframe with the Type 1  
> structure, and then process the Type 2 records in the dataframe  
> adding new fields for these records (NULL valued for Type 1)?

If they had the same dividing points that might work, but I am  
guessing that it would not.
What about reading it in for one record type, outputting the good  
records, then redoing the input with the other record type?

-- 
David.

> Hug
>
> --- On Mon, 2/1/10, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> From: David Winsemius <dwinsemius at comcast.net>
>> Subject: Re: [R] Import fixed-format ascii file with mixed record  
>> types
>> To: "trece por ciento" <el13porciento at yahoo.com>
>> Cc: r-help at r-project.org
>> Date: Monday, February 1, 2010, 2:23 PM
>>
>> On Feb 1, 2010, at 2:33 PM, trece por ciento wrote:
>>
>>> Thanks David, but can read.fwf cope with different
>> record types?
>>> For example, if recordtype is the 4th character, I
>> could have:
>>>
>>> 011125678 ---> This is record Type 1
>>> 011136779 ---> This is record Type 1
>>> 011124943 ---> This is record Type 1
>>> 011286711 ---> This is record Type 2
>>> 011234872 ---> This is record Type 2
>>> 011135628 ---> This is record Type 1
>>>
>>> So, how can I tell read.fwf to take the correct type
>> into account?
>>
>> You may need to separate the line-types first. If the
>> numbers of lines are not too large then this would exemplify
>> a strategy:
>>
>>> txt <- "011125678
>> + 011136779
>> + 011124943
>> + 011286711
>> + 011234872
>> + 011135628"
>>
>>> substr(readLines(textConnection(txt)), 4,4)
>> [1] "1" "1" "1" "2" "2" "1"
>>> file1 <-
>> readLines(textConnection(txt))[substr(readLines(textConnection(txt)),
>> 4,4) == "1"]
>>> file2 <-
>> readLines(textConnection(txt))[substr(readLines(textConnection(txt)),
>> 4,4) == "2"]
>>> file1
>> [1] "011125678" "011136779" "011124943" "011135628"
>>> file2
>> [1] "011286711" "011234872"
>>
>> Then these text objects could be processed with
>> read.fwf(textConnection(file1) and the same for file2.
>>
>> --David.
>>
>>> Thanks again,
>>> Hug
>>>
>>> --- On Mon, 2/1/10, David Winsemius <dwinsemius at comcast.net>
>> wrote:
>>>
>>> From: David Winsemius <dwinsemius at comcast.net>
>>> Subject: Re: [R] Import fixed-format ascii file with
>> mixed record types
>>> To: "trece por ciento" <el13porciento at yahoo.com>
>>> Cc: r-help at r-project.org
>>> Date: Monday, February 1, 2010, 12:01 PM
>>>
>>>
>>> On Feb 1, 2010, at 11:40 AM, trece por ciento wrote:
>>>
>>>> I need to import several ascii files in fixed
>> format with two different record types. The data comes from
>> European Labor Force Surveys, wich is a household survey.
>> The first record type is for people over 16 years, and the
>> second much sorter is for people aged 15 or less (this
>> record has a filler with several blanks to get the same
>> record length).
>>>> The files tipically have 160000 records, with 176
>> characters per record, the data is numeric, corresponding to
>> 102 variables, mostly integers (seven variables have two
>> decimals). My opertating system is Windows XP.
>>>> My questions:
>>>> 1. Wich do you think is the best way to import the
>> files into R?
>>>
>>>
>>> ?read.fwf
>>>
>>>> 2. Could you give me any references or examples?
>>>
>>> There are examples in the help page.
>>>
>>>> Thanking you in advance,
>>>> Hug
>>>>
>>>>
>>>>
>>>>
>>>>      [[alternative HTML version
>> deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org
>> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained,
>> reproducible code.
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>>
>>>
>>>
>>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>
>
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list