[R] Value Labels: SPSS Dataset to R
John Kane
jrkr|de@u @end|ng |rom gm@||@com
Sat Feb 8 16:28:45 CET 2020
"use a different function (read_spss) as John has suggested to import the
file. "
No! As far as I can see sjlabelled is simply using haven"s function "
read_sav()" to read in the data. It is just wrapped in the "read.spss()
function.There should be no difference between read_sav(sdata.sav) and
read_spss(sdata.sav).
It just seems to keep the code simpler (more aesthetically pleasing?) if
you do not load more packages than needed. Likewise you do not need to load
"labels" as sjlabelledis taking care of this for you.
Oh, BTW Scratch$sex %>% attr('labels') can be replaced by something like
get_labels(dat1) in my example. There usually are a multitude of ways to do
the same thing in R.
You might want to have a look at
https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
and https://strengejacke.github.io/sjlabelled/articles/labelleddata.html
for more about working with labels.
On Sat, 8 Feb 2020 at 09:35, Yawo Kokuvi <yawo1964 using gmail.com> wrote:
> Thanks so much for all your assistance. I admit R's learning curve is a
> bit steep, but I am eager to learn ... and hopefully teach with it.
>
> with regard to my problem, I can now see two options: either declare each
> categorical variable as factors, specifying the needed levels and labels.
>
> OR
>
> use a different function (read_spss) as John has suggested to import the
> file.
>
> I will experiment with both.
>
> With much appreciation, cY
>
> On Sat, Feb 8, 2020 at 9:25 AM John Kane <jrkrideau using gmail.com> wrote:
>
>> Hi Yawo Kokuvi;
>> As an R newbie transitioning from SPSS to R expect culture shock and the
>> possible feeling that yor brain is twisting within your skull but it is
>> well worth.
>>
>> Try something like this:
>> ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> dat1 <- structure(list(Animal = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0), label = "Animal", labels = c(Cat = 0, Dog = 1), class =
>> "haven_labelled"),
>> Training = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), label = "Type
>> of Training", labels = c(`Food as Reward` = 0,
>> `Affection as Reward` = 1), class = "haven_labelled"), Dance =
>> structure(c(1,
>> 1, 1, 1, 1, 1, 1, 1, 1, 1), label = "Did they dance?", labels = c(No
>> = 0,
>> Yes = 1), class = "haven_labelled")), row.names = c(NA, -10L
>> ), class = c("tbl_df", "tbl", "data.frame"))
>>
>>
>> library(sjlabelled)
>> str(dat1)
>> get_labels(dat1)
>> barplot(table(as_label(dat1$Dance)))
>> ##==================================================================
>> Your problem sees to be omitting the as_label().
>>
>> You do not need to load "haven"
>> read_spss() in sjlabelled should do the trick.
>>
>>
>> On Sat, 8 Feb 2020 at 05:44, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>
>>> Hello,
>>>
>>> Try
>>>
>>> aux_fun <- function(x){
>>> levels <- attr(x, "labels")
>>> factor(x, labels = names(levels), levels = levels)
>>> }
>>>
>>> newCatsDogs <- as.data.frame(lapply(CatsDogs, aux_fun))
>>>
>>> str(newCatsDogs)
>>> #'data.frame': 10 obs. of 3 variables:
>>> # $ Animal : Factor w/ 2 levels "Cat","Dog": 1 1 1 1 1 1 1 1 1 1
>>> # $ Training: Factor w/ 2 levels "Food as Reward",..: 1 1 1 1 1 1 1 1 1 1
>>> # $ Dance : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2
>>>
>>>
>>> As for the
>>> - frequencies: ?table, ?tapply, ?aggregate,
>>> - barplots: ?barplot
>>>
>>> You can find lots and lots of examples online of both covering what
>>> seems to simple use cases.
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Às 06:03 de 08/02/20, Yawo Kokuvi escreveu:
>>> > Thanks for all. Here is output from dput. I used a different dataset
>>> > containing categorical variables since the previous one is on a
>>> different
>>> > computer.
>>> >
>>> > In the following dataset, my interest is in getting frequencies and
>>> > barplots for the two variables: Training and Dance, with value labels
>>> > displayed.
>>> >
>>> > thanks again - cY
>>> >
>>> >
>>> > =========
>>> > dput(head(CatsDogs, n = 10))
>>> > structure(
>>> > list(
>>> > Animal = structure(
>>> > c(0, 0, 0, 0, 0, 0, 0, 0, 0,
>>> > 0),
>>> > label = "Animal",
>>> > labels = c(Cat = 0, Dog = 1),
>>> > class = "haven_labelled"
>>> > ),
>>> > Training = structure(
>>> > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
>>> > label = "Type of Training",
>>> > labels = c(`Food as Reward` = 0,
>>> > `Affection as Reward` = 1),
>>> > class = "haven_labelled"
>>> > ),
>>> > Dance = structure(
>>> > c(1,
>>> > 1, 1, 1, 1, 1, 1, 1, 1, 1),
>>> > label = "Did they dance?",
>>> > labels = c(No = 0,
>>> > Yes = 1),
>>> > class = "haven_labelled"
>>> > )
>>> > ),
>>> > row.names = c(NA,-10L),
>>> > class = c("tbl_df", "tbl", "data.frame")
>>> > )
>>> >
>>> >
>>> > On Fri, Feb 7, 2020 at 10:14 PM Bert Gunter <bgunter.4567 using gmail.com>
>>> wrote:
>>> >
>>> >> Yes. Most attachments are stripped by the server.
>>> >>
>>> >> Bert Gunter
>>> >>
>>> >> "The trouble with having an open mind is that people keep coming
>>> along and
>>> >> sticking things into it."
>>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >>
>>> >>
>>> >> On Fri, Feb 7, 2020 at 5:34 PM John Kane <jrkrideau using gmail.com> wrote:
>>> >>
>>> >>> Hi,
>>> >>> Could you upload some sample data in dput form? Something like
>>> >>> dput(head(Scratch, n=13)) will give us some real data to examine.
>>> Just
>>> >>> copy
>>> >>> and paste the output of dput(head(Scratch, n=13))into the email.
>>> This is
>>> >>> the best way to ensure that R-help denizens are getting the data in
>>> the
>>> >>> exact format that you have.
>>> >>>
>>> >>> On Fri, 7 Feb 2020 at 15:32, Yawo Kokuvi <yawo1964 using gmail.com> wrote:
>>> >>>
>>> >>>> Thanks for all your assistance
>>> >>>>
>>> >>>> Attached please is the Rdata scratch I have been using
>>> >>>>
>>> >>>> -----------------------------------------------------
>>> >>>>
>>> >>>>> head(Scratch, n=13)
>>> >>>> # A tibble: 13 x 6
>>> >>>> ID marital sex race paeduc speduc
>>> >>>> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
>>> >>>> 1 1 3 [DIVORCED] 1 [MALE] 1 [WHITE] NA NA
>>> >>>> 2 2 1 [MARRIED] 1 [MALE] 1 [WHITE] NA NA
>>> >>>> 3 3 3 [DIVORCED] 1 [MALE] 1 [WHITE] 4 NA
>>> >>>> 4 4 4 [SEPARATED] 1 [MALE] 1 [WHITE] 16 NA
>>> >>>> 5 5 3 [DIVORCED] 1 [MALE] 1 [WHITE] 18 NA
>>> >>>> 6 6 1 [MARRIED] 2 [FEMALE] 1 [WHITE] 14 20
>>> >>>> 7 7 1 [MARRIED] 2 [FEMALE] 2 [BLACK] NA 12
>>> >>>> 8 8 1 [MARRIED] 2 [FEMALE] 1 [WHITE] NA 12
>>> >>>> 9 9 3 [DIVORCED] 2 [FEMALE] 1 [WHITE] 11 NA
>>> >>>> 10 10 1 [MARRIED] 2 [FEMALE] 1 [WHITE] 16 12
>>> >>>> 11 11 5 [NEVER MARRIED] 2 [FEMALE] 2 [BLACK] NA NA
>>> >>>> 12 12 3 [DIVORCED] 2 [FEMALE] 2 [BLACK] NA NA
>>> >>>> 13 13 3 [DIVORCED] 2 [FEMALE] 2 [BLACK] 16 NA
>>> >>>>
>>> >>>> -----------------------------------------------------
>>> >>>>
>>> >>>> and below is my script/command file.
>>> >>>>
>>> >>>> *#1: Load library and import SPSS dataset*
>>> >>>> library(haven)
>>> >>>> Scratch <- read_sav("~/Desktop/Scratch.sav")
>>> >>>>
>>> >>>> *#2: save the dataset with a name*
>>> >>>> save(ScratchImport, file="Scratch.Rdata")
>>> >>>>
>>> >>>> *#3: install & load necessary packages for descriptive statistics*
>>> >>>> install.packages ("freqdist")
>>> >>>> library (freqdist)
>>> >>>>
>>> >>>> install.packages ("sjlabelled")
>>> >>>> library (sjlabelled)
>>> >>>>
>>> >>>> install.packages ("labelled")
>>> >>>> library (labelled)
>>> >>>>
>>> >>>> install.packages ("surveytoolbox")
>>> >>>> library (surveytoolbox)
>>> >>>>
>>> >>>> *#4: Check the value labels of gender and marital status*
>>> >>>> Scratch$sex %>% attr('labels')
>>> >>>> Scratch$marital %>% attr('labels')
>>> >>>>
>>> >>>> *#5: Frequency Distribution and BarChart for Categorical/Ordinal
>>> Level
>>> >>>> Variables such as Gender - SEX*
>>> >>>> freqdist(Scratch$sex)
>>> >>>> barplot(table(Scratch$marital))
>>> >>>>
>>> >>>> -----------------------------------------------------
>>> >>>>
>>> >>>> As you can see from above, I use the <haven> package to import the
>>> data
>>> >>>> from SPSS. Apparently, the haven function keeps the value labels,
>>> as
>>> >>> the
>>> >>>> attribute options in section #4 of my script shows.
>>> >>>> The problem is that when I run frequency distribution for any of the
>>> >>>> categorical variables like sex or marital status, only the numbers
>>> (1,
>>> >>> 2,)
>>> >>>> are displayed in the output. The labels (male, female) for example
>>> are
>>> >>>> not.
>>> >>>>
>>> >>>> Is there any way to force these to be shown in the output? Is
>>> there a
>>> >>>> global property that I have to set so that these value labels are
>>> >>> reliably
>>> >>>> displayed with every output? I read I can declare them as factors
>>> using
>>> >>>> the <as_factor()>, but once I do so, how do I invoke them in my
>>> >>> commands so
>>> >>>> that the value labels show...
>>> >>>>
>>> >>>> Sorry about all the noobs questions, but Ihopefully, I am able to
>>> get
>>> >>> this
>>> >>>> working.
>>> >>>>
>>> >>>> Thanks in advance.
>>> >>>>
>>> >>>>
>>> >>>> Thanks - cY
>>> >>>>
>>> >>>>
>>> >>>> On Fri, Feb 7, 2020 at 1:14 PM <cpolwart using chemo.org.uk> wrote:
>>> >>>>
>>> >>>>> I've never used it, but there is a labels function in haven...
>>> >>>>>
>>> >>>>> On 7 Feb 2020 17:05, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>> >>>>>
>>> >>>>> What does your data look like after importing? -- see ?head and
>>> ?str
>>> >>> to
>>> >>>>> tell us. Show us the code that failed to provide "labels." See the
>>> >>>> posting
>>> >>>>> guide below for how to post questions that are likely to elicit
>>> >>> helpful
>>> >>>>> responses.
>>> >>>>>
>>> >>>>> I know nothing about the haven package, but see ?factor or go
>>> through
>>> >>> an
>>> >>>> R
>>> >>>>> tutorial or two to learn about factors, which may be part of the
>>> issue
>>> >>>>> here. R *generally* obtains whatever "label" info it needs from the
>>> >>>> object
>>> >>>>> being tabled -- see ?tabulate, ?table etc. -- if that's what you're
>>> >>>> doing.
>>> >>>>>
>>> >>>>> Bert Gunter
>>> >>>>>
>>> >>>>> "The trouble with having an open mind is that people keep coming
>>> along
>>> >>>> and
>>> >>>>> sticking things into it."
>>> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >>>>>
>>> >>>>>
>>> >>>>> On Fri, Feb 7, 2020 at 8:28 AM Yawo Kokuvi <yawo1964 using gmail.com>
>>> >>> wrote:
>>> >>>>>
>>> >>>>>> Hello,
>>> >>>>>>
>>> >>>>>> I am just transitioning from SPSS to R.
>>> >>>>>>
>>> >>>>>> I used the haven library to import some of my spss data files to
>>> R.
>>> >>>>>>
>>> >>>>>> However, when I run procedures such as frequencies or crosstabs,
>>> >>> value
>>> >>>>>> labels for categorical variables such as gender (1=male, 2=female)
>>> >>> are
>>> >>>>> not
>>> >>>>>> shown. The same applies to many other output.
>>> >>>>>>
>>> >>>>>> I am confused.
>>> >>>>>>
>>> >>>>>> 1. Is there a global setting that I can use to force all
>>> categorical
>>> >>>>>> variables to display labels?
>>> >>>>>>
>>> >>>>>> 2. Or, are these labels to be set for each function or package?
>>> >>>>>>
>>> >>>>>> 3. How can I request the value labels for each function I run?
>>> >>>>>>
>>> >>>>>> Thanks in advance for your help..
>>> >>>>>>
>>> >>>>>> Best, Yawo
>>> >>>>>>
>>> >>>>>> [[alternative HTML version deleted]]
>>> >>>>>>
>>> >>>>>> ______________________________________________
>>> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>>> PLEASE do read the posting guide
>>> >>>>>> http://www.R-project.org/posting-guide.html
>>> >>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>>>>
>>> >>>>>
>>> >>>>> [[alternative HTML version deleted]]
>>> >>>>>
>>> >>>>> ______________________________________________
>>> >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>> PLEASE do read the posting guide
>>> >>>>> http://www.R-project.org/posting-guide.html
>>> >>>>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>> [[alternative HTML version deleted]]
>>> >>>>
>>> >>>> ______________________________________________
>>> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>> PLEASE do read the posting guide
>>> >>>> http://www.R-project.org/posting-guide.html
>>> >>>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> John Kane
>>> >>> Kingston ON Canada
>>> >>>
>>> >>> [[alternative HTML version deleted]]
>>> >>>
>>> >>> ______________________________________________
>>> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>> PLEASE do read the posting guide
>>> >>> http://www.R-project.org/posting-guide.html
>>> >>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>
>>> >>
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> John Kane
>> Kingston ON Canada
>>
>
--
John Kane
Kingston ON Canada
[[alternative HTML version deleted]]
More information about the R-help
mailing list