[R] use of variable labels
Thomas Lumley
tlumley at u.washington.edu
Wed Apr 9 00:19:06 CEST 2003
On Tue, 8 Apr 2003, janet rosenbaum wrote:
>
> The mean was just an example. We have a 4000 line program that expects
> numbers. I was hoping that there would be some way of dealing with this
> problem on the level of the data.frame.
as.data.frame(lapply(df,as.numeric))
would work if all your variables were either unlabelled or completely
labelled, but it doesn't seem any simpler than using convert.factors=FALSE
> I'm guessing I'm just going to have to throw out the labels since it's
> not practical to cast as a number every time and I also just noticed
> something strange about having convert.factors=TRUE:
>
> When I do
> read.dta("filename.dta")
> some of the variables which are numbers are read as NA:
> age educyrs
> refuse: 0 refuse: 0
> DK : 0 DK : 0
> NA's :1068 NA's :1068
>
> When I do
> read.dta("filename.dta", convert.factors=FALSE)
> the variables are again treated like numbers:
>
> age educyrs
> Min. :18.00 Min. : 0.00
> 1st Qu.:30.00 1st Qu.: 5.00
> Median :41.00 Median : 9.00
> Mean :43.18 Mean : 8.65
> 3rd Qu.:54.00 3rd Qu.:12.00
> Max. :88.00 Max. :40.00
> NA's :18.00 NA's :87.00
>
> I'm guessing that this means that by default -only- the labels are used
> when convert.factors=TRUE, and even variables without labels have to be
> cast as numbers.
No, that is not the case. I suspect you have variable labels
declared in Stata for these variables, it's just that the variables don't
take on those values.
read.dta does assume that if any value of a variable has a label then all
values should. It doesn't eg handle labels for different types of missing
on an otherwise numeric variable.
-thomas
More information about the R-help
mailing list