Variable lables (was Re: [R] Reading SAS version 8 data into

Martyn Plummer plummer at iarc.fr
Fri Aug 24 10:49:08 CEST 2001


On 24-Aug-2001 Prof Brian D Ripley wrote:
> On Fri, 24 Aug 2001 pauljohn at ukans.edu wrote:
> 
>> I will try this method to export a sas file, but reading it made
>> me wonder about "variable lables" and "value lables" in R.  In
>> SAS and SPSS, the lables are a huge chunk of code and people
>> want to hang onto them.  In case you have not used data from the
>> Universty of Michigan's ICPSR, you might not have seen how
>> elaborate this can get.  Here's a link to a SAS program that
>> reads in an ascii dataset. It has thousands of lables:
>>
>> http://lark.cc.ukans.edu/~pauljohn/sa2684.gz
>>
>> (This is a famous one, the American National Election Study)
>>
>> Netscape unzips this and shows it as text on the screen.
>>
>> A program like SAS or SPSS will use these lables to beautify
>> frequencies and such, and I've not heard much in the R group
>> about it, and I just wondered if you do ever talk about it.
> 
> Because it's no big deal. Those are factor levels.  R has factors.
> Whether they get exported from SAS and converted by read.xpt I can't say.

Preserving value labels from SAS datasets is not as easy as it should be.

SAS value labels are not part of the dataset, but are kept in a separate
file called a format catalogue. The XPORT engine does not work with SAS
catalogues, so you need to convert the format catalogue to a SAS database.
You can do this with the cntlout option in PROC FORMAT. [Conversely the
cntlin option creates a format catalogue from a database.]

We use a program called Stat/Transfer to convert between different
file formats.  Recent versions of Stat/Transfer will preserve SAS value
labels if you supply a format dataset.  [It  doesn't support R, but you
can get from SAS to R via Stata]. I suppose that you could get read.xport
to work the same way ...

SAS value labels are not quite the same as S factor labels since
the mapping from values to labels may be many-to-one.  For example, you
can categorize a continuous variable by supplying ranges of values to be
given the same label.  The variable is then treated like a categorical
variable in tabulations, etc. but the underlying values are preserved
in the dataset and may be recovered by changing the format.

Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list