[R] Reading fixed column format

Barry Rowlingson B.Rowlingson at lancaster.ac.uk
Wed Sep 13 13:15:53 CEST 2006


Anupam Tyagi wrote:

> There are 356,112 records, and 326 variables. It has a fixed record length of
> 1283 positions, therefore "cut -b" can not be used.

Okay, thats 'large' enough to be awkward...

> It would be good to have a facility in R which defines the meta-data: labelling
> and structure of the dataset: positions of variables, their names, their lables,
> their levels (e.g. for ordered choice or group variables: yes, sometimes, no
> type responses). This can be saved as a seperate object and passed to a function
> that gets the named varibales from the ASCII file (names of variables to get can
> be given as arguments or as, attaches the meta data and creates a dataframe with
> all the meta-data attached. The meta-data of the dataframe could include notes
> at dataframe and variable level, and other information. This information is
> passed on to the plotting functions and used when formatting the output of
> statistical procedures.

  I think you need the following functions to build that kind of thing in R:

  * z = unz("/tmp/file.zip","data.dat") - to create a connection to a 
file in a zip archive - this saves you having to explicitly unzip it...

  * open(z) - to open the connection to the file in the zip...

  * readLines(z,n) - to read 'n' lines from the current position in the 
file...

  * seek(z,m*lineLength-1) - to jump to line 'm' ready to read it.

  Then its just 'substr' and similar string-chopping functions to build 
up the data from each line you want.

  If I had a spare day...

Barry



More information about the R-help mailing list