[R] Value Labels: SPSS Dataset to R

Yawo Kokuvi y@wo1964 @end|ng |rom gm@||@com
Sat Feb 8 16:36:25 CET 2020


Thanks again - I realized after posting that sjlabelled is indirectly
referencing haven's read_sav function.  For a moment I thought you were
referring to the read.spss under the older foreign package.  But then
realized that read_sav and read_spss are equivalent. So that's clear now.

And I also realized there are so many ways to do the same thing in R - so
as part of learning, I am discovering these different ways, and knowing
when to use one over the other.

Thanks for the references - I will read further on them.

cheers, cY

On Sat, Feb 8, 2020 at 10:28 AM John Kane <jrkrideau using gmail.com> wrote:

> "use a different function (read_spss) as John has suggested to import the
> file. "
>
> No! As far as I can see sjlabelled is simply using haven"s function "
> read_sav()" to read in the data. It is just wrapped in the "read.spss()
> function.There should be no difference between read_sav(sdata.sav) and
> read_spss(sdata.sav).
>
> It just seems to keep the code simpler (more aesthetically pleasing?) if
> you do not load more packages than needed. Likewise you do not need to load
> "labels" as sjlabelledis taking care of this for you.
>
> Oh, BTW  Scratch$sex %>% attr('labels') can be replaced by something like
> get_labels(dat1) in my example. There usually are a multitude of ways to do
> the same thing in R.
>
> You might want to have a look at
> https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
> and https://strengejacke.github.io/sjlabelled/articles/labelleddata.html
> for more about working with labels.
>
> On Sat, 8 Feb 2020 at 09:35, Yawo Kokuvi <yawo1964 using gmail.com> wrote:
>
>> Thanks so much for all your assistance.  I admit R's learning curve is a
>> bit steep, but I am eager to learn ... and hopefully teach with it.
>>
>> with regard to my problem, I can now see two options:  either declare
>> each categorical variable as factors, specifying the needed levels and
>> labels.
>>
>> OR
>>
>> use a different function (read_spss) as John has suggested to import the
>> file.
>>
>> I will experiment with both.
>>
>> With much appreciation, cY
>>
>> On Sat, Feb 8, 2020 at 9:25 AM John Kane <jrkrideau using gmail.com> wrote:
>>
>>> Hi Yawo Kokuvi;
>>> As an R newbie transitioning from SPSS to R expect culture shock and the
>>> possible feeling that yor brain is twisting within your skull but it is
>>> well worth.
>>>
>>> Try something like this:
>>> ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> dat1  <- structure(list(Animal = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0,
>>> 0), label = "Animal", labels = c(Cat = 0, Dog = 1), class =
>>> "haven_labelled"),
>>>     Training = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), label = "Type
>>> of Training", labels = c(`Food as Reward` = 0,
>>>     `Affection as Reward` = 1), class = "haven_labelled"), Dance =
>>> structure(c(1,
>>>     1, 1, 1, 1, 1, 1, 1, 1, 1), label = "Did they dance?", labels = c(No
>>> = 0,
>>>     Yes = 1), class = "haven_labelled")), row.names = c(NA, -10L
>>> ), class = c("tbl_df", "tbl", "data.frame"))
>>>
>>>
>>> library(sjlabelled)
>>> str(dat1)
>>> get_labels(dat1)
>>> barplot(table(as_label(dat1$Dance)))
>>> ##==================================================================
>>> Your problem sees to be omitting the as_label().
>>>
>>> You do not need to load "haven"
>>> read_spss() in sjlabelled should do the trick.
>>>
>>>
>>> On Sat, 8 Feb 2020 at 05:44, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>>
>>>> Hello,
>>>>
>>>> Try
>>>>
>>>> aux_fun <- function(x){
>>>>    levels <- attr(x, "labels")
>>>>    factor(x, labels = names(levels), levels = levels)
>>>> }
>>>>
>>>> newCatsDogs <- as.data.frame(lapply(CatsDogs, aux_fun))
>>>>
>>>> str(newCatsDogs)
>>>> #'data.frame':  10 obs. of  3 variables:
>>>> # $ Animal  : Factor w/ 2 levels "Cat","Dog": 1 1 1 1 1 1 1 1 1 1
>>>> # $ Training: Factor w/ 2 levels "Food as Reward",..: 1 1 1 1 1 1 1 1 1
>>>> 1
>>>> # $ Dance   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2
>>>>
>>>>
>>>> As for the
>>>>   - frequencies: ?table, ?tapply, ?aggregate,
>>>>   - barplots: ?barplot
>>>>
>>>> You can find lots and lots of examples online of both covering what
>>>> seems to simple use cases.
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 06:03 de 08/02/20, Yawo Kokuvi escreveu:
>>>> > Thanks for all. Here is output from dput.  I used a different dataset
>>>> > containing categorical variables since the previous one is on a
>>>> different
>>>> > computer.
>>>> >
>>>> > In the following dataset, my interest is in getting frequencies and
>>>> > barplots for the two variables: Training and Dance, with value labels
>>>> > displayed.
>>>> >
>>>> > thanks again - cY
>>>> >
>>>> >
>>>> > =========
>>>> > dput(head(CatsDogs, n = 10))
>>>> > structure(
>>>> >    list(
>>>> >      Animal = structure(
>>>> >        c(0, 0, 0, 0, 0, 0, 0, 0, 0,
>>>> >          0),
>>>> >        label = "Animal",
>>>> >        labels = c(Cat = 0, Dog = 1),
>>>> >        class = "haven_labelled"
>>>> >      ),
>>>> >      Training = structure(
>>>> >        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
>>>> >        label = "Type of Training",
>>>> >        labels = c(`Food as Reward` = 0,
>>>> >                   `Affection as Reward` = 1),
>>>> >        class = "haven_labelled"
>>>> >      ),
>>>> >      Dance = structure(
>>>> >        c(1,
>>>> >          1, 1, 1, 1, 1, 1, 1, 1, 1),
>>>> >        label = "Did they dance?",
>>>> >        labels = c(No = 0,
>>>> >                   Yes = 1),
>>>> >        class = "haven_labelled"
>>>> >      )
>>>> >    ),
>>>> >    row.names = c(NA,-10L),
>>>> >    class = c("tbl_df", "tbl", "data.frame")
>>>> > )
>>>> >
>>>> >
>>>> > On Fri, Feb 7, 2020 at 10:14 PM Bert Gunter <bgunter.4567 using gmail.com>
>>>> wrote:
>>>> >
>>>> >> Yes. Most attachments are stripped by the server.
>>>> >>
>>>> >> Bert Gunter
>>>> >>
>>>> >> "The trouble with having an open mind is that people keep coming
>>>> along and
>>>> >> sticking things into it."
>>>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>> >>
>>>> >>
>>>> >> On Fri, Feb 7, 2020 at 5:34 PM John Kane <jrkrideau using gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> Hi,
>>>> >>> Could you upload some sample data in dput form?  Something like
>>>> >>> dput(head(Scratch, n=13)) will give us some real data to examine.
>>>> Just
>>>> >>> copy
>>>> >>> and paste the output of dput(head(Scratch, n=13))into the email.
>>>> This is
>>>> >>> the best way to ensure that R-help denizens are getting the data in
>>>> the
>>>> >>> exact format that you have.
>>>> >>>
>>>> >>> On Fri, 7 Feb 2020 at 15:32, Yawo Kokuvi <yawo1964 using gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>>> Thanks for all your assistance
>>>> >>>>
>>>> >>>> Attached please is the Rdata scratch I have been using
>>>> >>>>
>>>> >>>> -----------------------------------------------------
>>>> >>>>
>>>> >>>>> head(Scratch, n=13)
>>>> >>>> # A tibble: 13 x 6
>>>> >>>>        ID           marital        sex      race    paeduc
>>>> speduc
>>>> >>>>     <dbl>         <dbl+lbl>  <dbl+lbl> <dbl+lbl> <dbl+lbl>
>>>> <dbl+lbl>
>>>> >>>>   1     1 3 [DIVORCED]      1 [MALE]   1 [WHITE]        NA
>>>> NA
>>>> >>>>   2     2 1 [MARRIED]       1 [MALE]   1 [WHITE]        NA
>>>> NA
>>>> >>>>   3     3 3 [DIVORCED]      1 [MALE]   1 [WHITE]         4
>>>> NA
>>>> >>>>   4     4 4 [SEPARATED]     1 [MALE]   1 [WHITE]        16
>>>> NA
>>>> >>>>   5     5 3 [DIVORCED]      1 [MALE]   1 [WHITE]        18
>>>> NA
>>>> >>>>   6     6 1 [MARRIED]       2 [FEMALE] 1 [WHITE]        14
>>>> 20
>>>> >>>>   7     7 1 [MARRIED]       2 [FEMALE] 2 [BLACK]        NA
>>>> 12
>>>> >>>>   8     8 1 [MARRIED]       2 [FEMALE] 1 [WHITE]        NA
>>>> 12
>>>> >>>>   9     9 3 [DIVORCED]      2 [FEMALE] 1 [WHITE]        11
>>>> NA
>>>> >>>> 10    10 1 [MARRIED]       2 [FEMALE] 1 [WHITE]        16        12
>>>> >>>> 11    11 5 [NEVER MARRIED] 2 [FEMALE] 2 [BLACK]        NA        NA
>>>> >>>> 12    12 3 [DIVORCED]      2 [FEMALE] 2 [BLACK]        NA        NA
>>>> >>>> 13    13 3 [DIVORCED]      2 [FEMALE] 2 [BLACK]        16        NA
>>>> >>>>
>>>> >>>> -----------------------------------------------------
>>>> >>>>
>>>> >>>> and below is my script/command file.
>>>> >>>>
>>>> >>>> *#1: Load library and import SPSS dataset*
>>>> >>>> library(haven)
>>>> >>>> Scratch <- read_sav("~/Desktop/Scratch.sav")
>>>> >>>>
>>>> >>>> *#2: save the dataset with a name*
>>>> >>>> save(ScratchImport, file="Scratch.Rdata")
>>>> >>>>
>>>> >>>> *#3: install & load necessary packages for descriptive statistics*
>>>> >>>> install.packages ("freqdist")
>>>> >>>> library (freqdist)
>>>> >>>>
>>>> >>>> install.packages ("sjlabelled")
>>>> >>>> library (sjlabelled)
>>>> >>>>
>>>> >>>> install.packages ("labelled")
>>>> >>>> library (labelled)
>>>> >>>>
>>>> >>>> install.packages ("surveytoolbox")
>>>> >>>> library (surveytoolbox)
>>>> >>>>
>>>> >>>> *#4: Check the value labels of gender and marital status*
>>>> >>>> Scratch$sex %>% attr('labels')
>>>> >>>> Scratch$marital %>% attr('labels')
>>>> >>>>
>>>> >>>> *#5:  Frequency Distribution and BarChart for Categorical/Ordinal
>>>> Level
>>>> >>>> Variables such as Gender - SEX*
>>>> >>>> freqdist(Scratch$sex)
>>>> >>>> barplot(table(Scratch$marital))
>>>> >>>>
>>>> >>>> -----------------------------------------------------
>>>> >>>>
>>>> >>>> As you can see from above, I use the <haven> package to import the
>>>> data
>>>> >>>> from SPSS.  Apparently, the haven function keeps the value labels,
>>>> as
>>>> >>> the
>>>> >>>> attribute options in section #4 of my script shows.
>>>> >>>> The problem is that when I run frequency distribution for any of
>>>> the
>>>> >>>> categorical variables like sex or marital status, only the numbers
>>>> (1,
>>>> >>> 2,)
>>>> >>>> are displayed in the output.  The labels (male, female) for
>>>> example are
>>>> >>>> not.
>>>> >>>>
>>>> >>>> Is there any way to force these to be shown in the output?  Is
>>>> there a
>>>> >>>> global property that I have to set so that these value labels are
>>>> >>> reliably
>>>> >>>> displayed with every output?  I read I can declare them as factors
>>>> using
>>>> >>>> the <as_factor()>, but once I do so, how do I invoke them in my
>>>> >>> commands so
>>>> >>>> that the value labels show...
>>>> >>>>
>>>> >>>> Sorry about all the noobs questions, but Ihopefully, I am able to
>>>> get
>>>> >>> this
>>>> >>>> working.
>>>> >>>>
>>>> >>>> Thanks in advance.
>>>> >>>>
>>>> >>>>
>>>> >>>> Thanks - cY
>>>> >>>>
>>>> >>>>
>>>> >>>> On Fri, Feb 7, 2020 at 1:14 PM <cpolwart using chemo.org.uk> wrote:
>>>> >>>>
>>>> >>>>> I've never used it, but there is a labels function in haven...
>>>> >>>>>
>>>> >>>>> On 7 Feb 2020 17:05, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> What does your data look like after importing? -- see ?head and
>>>> ?str
>>>> >>> to
>>>> >>>>> tell us. Show us the code that failed to provide "labels." See the
>>>> >>>> posting
>>>> >>>>> guide below for how to post questions that are likely to elicit
>>>> >>> helpful
>>>> >>>>> responses.
>>>> >>>>>
>>>> >>>>> I know nothing about the haven package, but see ?factor or go
>>>> through
>>>> >>> an
>>>> >>>> R
>>>> >>>>> tutorial or two to learn about factors, which may be part of the
>>>> issue
>>>> >>>>> here. R *generally* obtains whatever "label" info it needs from
>>>> the
>>>> >>>> object
>>>> >>>>> being tabled -- see ?tabulate, ?table etc. -- if that's what
>>>> you're
>>>> >>>> doing.
>>>> >>>>>
>>>> >>>>> Bert Gunter
>>>> >>>>>
>>>> >>>>> "The trouble with having an open mind is that people keep coming
>>>> along
>>>> >>>> and
>>>> >>>>> sticking things into it."
>>>> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Fri, Feb 7, 2020 at 8:28 AM Yawo Kokuvi <yawo1964 using gmail.com>
>>>> >>> wrote:
>>>> >>>>>
>>>> >>>>>> Hello,
>>>> >>>>>>
>>>> >>>>>> I am just transitioning from SPSS to R.
>>>> >>>>>>
>>>> >>>>>> I used the haven library to import some of my spss data files to
>>>> R.
>>>> >>>>>>
>>>> >>>>>> However, when I run procedures such as frequencies or crosstabs,
>>>> >>> value
>>>> >>>>>> labels for categorical variables such as gender (1=male,
>>>> 2=female)
>>>> >>> are
>>>> >>>>> not
>>>> >>>>>> shown. The same applies to many other output.
>>>> >>>>>>
>>>> >>>>>> I am confused.
>>>> >>>>>>
>>>> >>>>>> 1. Is there a global setting that I can use to force all
>>>> categorical
>>>> >>>>>> variables to display labels?
>>>> >>>>>>
>>>> >>>>>> 2. Or, are these labels to be set for each function or package?
>>>> >>>>>>
>>>> >>>>>> 3. How can I request the value labels for each function I run?
>>>> >>>>>>
>>>> >>>>>> Thanks in advance for your help..
>>>> >>>>>>
>>>> >>>>>> Best, Yawo
>>>> >>>>>>
>>>> >>>>>>          [[alternative HTML version deleted]]
>>>> >>>>>>
>>>> >>>>>> ______________________________________________
>>>> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>> see
>>>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >>>>>> PLEASE do read the posting guide
>>>> >>>>>> http://www.R-project.org/posting-guide.html
>>>> >>>>>> and provide commented, minimal, self-contained, reproducible
>>>> code.
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>> [[alternative HTML version deleted]]
>>>> >>>>>
>>>> >>>>> ______________________________________________
>>>> >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >>>>> PLEASE do read the posting guide
>>>> >>>>> http://www.R-project.org/posting-guide.html
>>>> >>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>          [[alternative HTML version deleted]]
>>>> >>>>
>>>> >>>> ______________________________________________
>>>> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >>>> PLEASE do read the posting guide
>>>> >>>> http://www.R-project.org/posting-guide.html
>>>> >>>> and provide commented, minimal, self-contained, reproducible code.
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> John Kane
>>>> >>> Kingston ON Canada
>>>> >>>
>>>> >>>          [[alternative HTML version deleted]]
>>>> >>>
>>>> >>> ______________________________________________
>>>> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >>> PLEASE do read the posting guide
>>>> >>> http://www.R-project.org/posting-guide.html
>>>> >>> and provide commented, minimal, self-contained, reproducible code.
>>>> >>>
>>>> >>
>>>> >
>>>> >       [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>> >
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>> --
>>> John Kane
>>> Kingston ON Canada
>>>
>>
>
> --
> John Kane
> Kingston ON Canada
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list