[R] Reading fixed column format
murdoch at stats.uwo.ca
Wed Sep 13 13:06:36 CEST 2006
Anupam Tyagi wrote:
> Barry Rowlingson <B.Rowlingson <at> lancaster.ac.uk> writes:
>>> None of these seem to read non-coniguous variables from columns; or
>>> may be I am missing something. "read.fwf" is not meant for large
>>> files according to a post in the archives. Thanks for the pointers. I
>>> have read the R data input and output. Anupam.
>> First up, how 'large' is your 'large ASCII file'? How many rows and
> There are 356,112 records, and 326 variables. It has a fixed record length of
> 1283 positions, therefore "cut -b" can not be used.
>> Secondly, what are 'non-contiguous' variables?
> When I do not want to read all columns. For example, I would like to read the
> StartingColumn VariableName FieldLength
> 1 STATE 2
> 24 INTVID 3
> 27 DISPCODE 3
> 30 PSU 10
read.fwf() can handle the skipped columns (you use negative column
values; see the man page). It will break the read up into blocks, so
the large size of the original file shouldn't be a problem.
> Sometimes I would also like to format the data after it has been read. For
> example, the ASCII file has price in columns 100 to 105 written as 005999. I
> want to read this and format it as 59.99 (omitting leading zeros in the price).
>> Perhaps if you posted the first few lines and columns of the file then
>> we might get an idea of how to read it in.
> I have not even downloaded the data onto my computer yet, because I am not sure
> I can read it in. The zipped file is 67MB. Using similar data a few years ago, I
> recall the unzipped file to be about 350--400 MB. I had used MySQL then, but it
> took some doing to get it in, and there were things that did not seem to work as
> I wanted them to---I could not figure out how to label the variables. I usually
> do not have to work with a dataframe of more than 10-30 MB at a time.
> It would be good to have a facility in R which defines the meta-data: labelling
> and structure of the dataset: positions of variables, their names, their lables,
> their levels (e.g. for ordered choice or group variables: yes, sometimes, no
> type responses). This can be saved as a seperate object and passed to a function
> that gets the named varibales from the ASCII file (names of variables to get can
> be given as arguments or as, attaches the meta data and creates a dataframe with
> all the meta-data attached. The meta-data of the dataframe could include notes
> at dataframe and variable level, and other information. This information is
> passed on to the plotting functions and used when formatting the output of
> statistical procedures.
> I agree with with Michael Kobovy that this is a very helpful list, and people do
> not owe less than what one paid for the software :)
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help