[R] reading fixed width format data with 2 types of lines
Charles C. Berry
cberry at tajo.ucsd.edu
Thu Aug 12 22:59:26 CEST 2010
On Thu, 12 Aug 2010, Tim Gruene wrote:
> I don't know if it's elegant enough for you, but you could split the file into
> two files with 'grep "^3" file > file_3' and 'grep "^4" file > file_4'
> and then read them in separately.
>
along the same lines, but all in R (untested)
original.lines <- readLines( filename )
tcon.3 <- textConnection( grep( "^3", original.lines, value=T ))
res.3 <- read.fwf( tcon.3, <etc> )
close(tcon.3)
tcon.4 <- textConnection( grep( "^4", original.lines, value=T ))
res.4 <- read.fwf( tcon.4, <etc> )
close(tcon.4)
rm( original.lines )
Or skip the readLines() step and use
tcon.3 <- pipe(paste("grep '^3'",filename))
...
I think you can use 'findstr.exe' on windows in lieu of grep.
HTH,
Chuck
> Tim
>
> On Thu, Aug 12, 2010 at 01:57:19PM -0400, Denis Chabot wrote:
>> Hi,
>>
>> I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large number of old fwf files with 2 types of lines. Lines that begin with "3" in first column carry one set of variables, and lines that begin with "4" carry another set, like this:
>>
>> â¦
>> 3A00206546L070049016090045 99 1015002 001001008010004002004007003 001
>> 3A00206546L070049006090030 99 1029001002001001006014002
>> 3A00206546L070049002290004 99 1015 001001
>> 3A00206546L070049001692559049033 1015 018036024
>> 3A00206546L070049002290004 99 1001 002
>> 4A00176546L068047090010111000606516400150010000001501063 065914
>> 4A00176546L06804709001011100040761600000000 1092 095614
>> 4A00196546L098000100010111001706214400005010000000051062 065914
>> 4A00176546L06804709001011100050591300000000 1062 065914
>> 4A00196546L098000100010111002604721400020010000000201042 046114
>> 4A00196546L098000100010111002504221400005012000000051042 046114
>> 4A00196546L098000100010111002903721400050012200000501032 036214
>> â¦
>>
>> I have searched for tricks to do this but I must not have used the right keywords, I found nothing.
>>
>> I suppose I could read the entire file as a single character variable for each line, then subset for lines that begin with 3 and save this in an ascii file that will then be reopened with a read.fwf call, and do the same with lines that begin with 4. But this does not appear to me to be very elegant nor efficient⦠Is there a better method?
>>
>> Thanks in advance,
>>
>>
>> Denis Chabot
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> --
> Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
>
> GPG Key ID = A46BEE1A
>
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list