[R] Reading fixed column format
B.Rowlingson at lancaster.ac.uk
Wed Sep 13 13:15:53 CEST 2006
Anupam Tyagi wrote:
> There are 356,112 records, and 326 variables. It has a fixed record length of
> 1283 positions, therefore "cut -b" can not be used.
Okay, thats 'large' enough to be awkward...
> It would be good to have a facility in R which defines the meta-data: labelling
> and structure of the dataset: positions of variables, their names, their lables,
> their levels (e.g. for ordered choice or group variables: yes, sometimes, no
> type responses). This can be saved as a seperate object and passed to a function
> that gets the named varibales from the ASCII file (names of variables to get can
> be given as arguments or as, attaches the meta data and creates a dataframe with
> all the meta-data attached. The meta-data of the dataframe could include notes
> at dataframe and variable level, and other information. This information is
> passed on to the plotting functions and used when formatting the output of
> statistical procedures.
I think you need the following functions to build that kind of thing in R:
* z = unz("/tmp/file.zip","data.dat") - to create a connection to a
file in a zip archive - this saves you having to explicitly unzip it...
* open(z) - to open the connection to the file in the zip...
* readLines(z,n) - to read 'n' lines from the current position in the
* seek(z,m*lineLength-1) - to jump to line 'm' ready to read it.
Then its just 'substr' and similar string-chopping functions to build
up the data from each line you want.
If I had a spare day...
More information about the R-help