[R] Reading fixed column format

Anupam Tyagi AnupTyagi at yahoo.com
Tue Sep 12 13:24:54 CEST 2006


Barry Rowlingson <B.Rowlingson <at> lancaster.ac.uk> writes:

> > None of these seem to read non-coniguous variables from columns; or 
> > may be I am missing something. "read.fwf" is not meant for large
> > files according to a post in the archives. Thanks for the pointers. I
> > have read the R data input and output. Anupam.
> 
>   First up, how 'large' is your 'large ASCII file'? How many rows and 
> columns?

There are 356,112 records, and 326 variables. It has a fixed record length of
1283 positions, therefore "cut -b" can not be used.
 
>   Secondly, what are 'non-contiguous' variables?

When I do not want to read all columns. For example, I would like to read the
following:

StartingColumn  VariableName  	FieldLength
1 	STATE 	2
24 	INTVID 	3
27 	DISPCODE 3
30 	PSU 	10

Sometimes I would also like to format the data after it has been read. For
example, the ASCII file has price in columns 100 to 105 written as 005999. I
want to read this and format it as 59.99 (omitting leading zeros in the price).

>   Perhaps if you posted the first few lines and columns of the file then 
> we might get an idea of how to read it in.

I have not even downloaded the data onto my computer yet, because I am not sure
I can read it in. The zipped file is 67MB. Using similar data a few years ago, I
recall the unzipped file to be about 350--400 MB. I had used MySQL then, but it
took some doing to get it in, and there were things that did not seem to work as
I wanted them to---I could not figure out how to label the variables. I usually
do not have to work with a dataframe of more than 10-30 MB at a time.

It would be good to have a facility in R which defines the meta-data: labelling
and structure of the dataset: positions of variables, their names, their lables,
their levels (e.g. for ordered choice or group variables: yes, sometimes, no
type responses). This can be saved as a seperate object and passed to a function
that gets the named varibales from the ASCII file (names of variables to get can
be given as arguments or as, attaches the meta data and creates a dataframe with
all the meta-data attached. The meta-data of the dataframe could include notes
at dataframe and variable level, and other information. This information is
passed on to the plotting functions and used when formatting the output of
statistical procedures.

I agree with with Michael Kobovy that this is a very helpful list, and people do
not owe less than what one paid for the software :)

Anupam.



More information about the R-help mailing list