[R] Reading text file with fortran format
Nordlund, Dan (DSHS/RDA)
NordlDJ at dshs.wa.gov
Wed Oct 1 00:18:32 CEST 2014
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Steven Yen
> Sent: Tuesday, September 30, 2014 2:04 PM
> To: r-help
> Subject: [R] Reading text file with fortran format
>
> Hello
>
> I read data with fortran format:
> mydata<-read.fortran('foo.txt',
> c("4F10.4","F8.3","3F3.0","20F2.0"))
> colnames(mydata)<-c("q1","q2","q3","q4","income","hhsize",
> "weekend","dietk","quart1","quart2","quart3","male","age35",
> "age50","age65","midwest","south","west","nonmetro",
> "suburb","black","asian","other","hispan","hhtype1",
> "hhtype2","hhtype3","emp_stat")
> dstat(mydata,digits=6)
>
> I produced the following sample statistics for the first 4
> variables (q1,q2,q3,q4):
>
> Mean Std.dev Min Max Obs
> q1 0.000923 0.002509 0 0.035245 5649
> q2 0.000698 0.001681 0 0.038330 5649
> q3 0.000766 0.002138 0 0.040100 5649
> q4 0.000373 0.001140 0 0.026374 5649
>
> The correct sample statistics are:
> Variable| Mean Std.Dev. Minimum Maximum
> --------+----------------------------------------------------
> Q1| 9.227632 25.09311 0.0 352.4508
> Q2| 6.983078 16.80984 0.0 383.2995
> Q3| 7.657381 21.38337 0.0 400.9950
> Q4| 3.727952 11.40446 0.0 263.7398
> INCOME| 16.01603 13.70296 0.0 100.0
> HHSIZE| 2.586475 1.464282 1.0 16.0
>
> In other words, values for q1-q4 were scaled down by a factor of
> 10,000.
> My raw data look like (with proper format)
>
> 0.0000 0.0000 0.0000 0.0000 48.108...
> 0.0000 0.0000 0.0000 0.0000 11.640...
> 35.3450 0.0000 95.7656 0.0000 4.667...
> 0.0000 0.0000 0.0000 0.0000 9.000...
> 84.0000 4.8038 0.0000 3.1886 2.923...
> 0.0000 0.0000 0.0000 1.1636 10.000...
> 0.0000 10.7818 109.7884 0.0000 17.000...
> 0.0000 7.9528 0.0000 4.7829 35.000...
>
> True that the data here are space delimited. But I need to read data
> elsewhere where data are not space delimited.
>
> Any idea/suggestion would be appreciated.
>
The read.fortran function appears to work differently from how FORTRAN would read the data if there are already decimals points in the numbers. If memory serves, FORTRAN ignores the decimal portion of the format if it finds a decimal in what it reads. The read.fortran function appears to read the number 'as is' and then multiplies by 10^-d, where d is the number of decimal places in the format. Since you have decimals specified, you should specify the format with 0 decimal places, i.e.
c("4F10.0","F8.0","3F3.0","20F2.0"))
hope this is helpful,
Dan
Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services
More information about the R-help
mailing list