[R] Value Labels: SPSS Dataset to R
Gregory Demin
gdem|n @end|ng |rom gm@||@com
Sat Feb 8 18:36:28 CET 2020
Hi,
With 'expss' package code for your task looks like this:
library(haven)
library(expss) # it is important to load expss after haven
# CatsDogs = read_spss("path_to_file")
CatsDogs = structure(
list(
Animal = structure(
c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0),
label = "Animal",
labels = c(Cat = 0, Dog = 1),
class = "haven_labelled"
),
Training = structure(
c(1, 0, 0, 1, 0, 1, 0, 0, 1, 0),
label = "Type of Training",
labels = c(`Food as Reward` = 0,
`Affection as Reward` = 1),
class = "haven_labelled"
),
Dance = structure(
c(1,
1, 0, 1, 1, 0, 1, 0, 1, 1),
label = "Did they dance?",
labels = c(No = 0,
Yes = 1),
class = "haven_labelled"
)
),
row.names = c(NA,-10L),
class = c("tbl_df", "tbl", "data.frame")
)
CatsDogs = add_labelled_class(CatsDogs) # set labelled class ffor
variables with labels
# frequnecies
fre(list(CatsDogs$Training, CatsDogs$Dance))
# | | | Count | Valid percent |
Percent | Responses, % | Cumulative responses, % |
# | ---------------- | ------------------- | ----- | ------------- |
------- | ------------ | ----------------------- |
# | Type of Training | Food as Reward | 6 | 60 |
60 | 60 | 60 |
# | | Affection as Reward | 4 | 40 |
40 | 40 | 100 |
# | | #Total | 10 | 100 |
100 | 100 | |
# | | <NA> | 0 | |
0 | | |
# | Did they dance? | No | 3 | 30 |
30 | 30 | 30 |
# | | Yes | 7 | 70 |
70 | 70 | 100 |
# | | #Total | 10 | 100 |
100 | 100 | |
# | | <NA> | 0 | |
0 | | |
# barplots
use_labels(CatsDogs, barplot(table(Training), legend.text = TRUE))
use_labels(CatsDogs, barplot(table(Dance), legend.text = TRUE))
use_labels(CatsDogs, barplot(table(Dance, Training), legend.text = TRUE))
Regards,
Gregory
сб, 8 февр. 2020 г. в 18:36, Yawo Kokuvi <yawo1964 using gmail.com>:
>
> Thanks again - I realized after posting that sjlabelled is indirectly
> referencing haven's read_sav function. For a moment I thought you were
> referring to the read.spss under the older foreign package. But then
> realized that read_sav and read_spss are equivalent. So that's clear now.
>
> And I also realized there are so many ways to do the same thing in R - so
> as part of learning, I am discovering these different ways, and knowing
> when to use one over the other.
>
> Thanks for the references - I will read further on them.
>
> cheers, cY
>
> On Sat, Feb 8, 2020 at 10:28 AM John Kane <jrkrideau using gmail.com> wrote:
>
> > "use a different function (read_spss) as John has suggested to import the
> > file. "
> >
> > No! As far as I can see sjlabelled is simply using haven"s function "
> > read_sav()" to read in the data. It is just wrapped in the "read.spss()
> > function.There should be no difference between read_sav(sdata.sav) and
> > read_spss(sdata.sav).
> >
> > It just seems to keep the code simpler (more aesthetically pleasing?) if
> > you do not load more packages than needed. Likewise you do not need to load
> > "labels" as sjlabelledis taking care of this for you.
> >
> > Oh, BTW Scratch$sex %>% attr('labels') can be replaced by something like
> > get_labels(dat1) in my example. There usually are a multitude of ways to do
> > the same thing in R.
> >
> > You might want to have a look at
> > https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
> > and https://strengejacke.github.io/sjlabelled/articles/labelleddata.html
> > for more about working with labels.
> >
> > On Sat, 8 Feb 2020 at 09:35, Yawo Kokuvi <yawo1964 using gmail.com> wrote:
> >
> >> Thanks so much for all your assistance. I admit R's learning curve is a
> >> bit steep, but I am eager to learn ... and hopefully teach with it.
> >>
> >> with regard to my problem, I can now see two options: either declare
> >> each categorical variable as factors, specifying the needed levels and
> >> labels.
> >>
> >> OR
> >>
> >> use a different function (read_spss) as John has suggested to import the
> >> file.
> >>
> >> I will experiment with both.
> >>
> >> With much appreciation, cY
> >>
> >> On Sat, Feb 8, 2020 at 9:25 AM John Kane <jrkrideau using gmail.com> wrote:
> >>
> >>> Hi Yawo Kokuvi;
> >>> As an R newbie transitioning from SPSS to R expect culture shock and the
> >>> possible feeling that yor brain is twisting within your skull but it is
> >>> well worth.
> >>>
> >>> Try something like this:
> >>> ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> dat1 <- structure(list(Animal = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> >>> 0), label = "Animal", labels = c(Cat = 0, Dog = 1), class =
> >>> "haven_labelled"),
> >>> Training = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), label = "Type
> >>> of Training", labels = c(`Food as Reward` = 0,
> >>> `Affection as Reward` = 1), class = "haven_labelled"), Dance =
> >>> structure(c(1,
> >>> 1, 1, 1, 1, 1, 1, 1, 1, 1), label = "Did they dance?", labels = c(No
> >>> = 0,
> >>> Yes = 1), class = "haven_labelled")), row.names = c(NA, -10L
> >>> ), class = c("tbl_df", "tbl", "data.frame"))
> >>>
> >>>
> >>> library(sjlabelled)
> >>> str(dat1)
> >>> get_labels(dat1)
> >>> barplot(table(as_label(dat1$Dance)))
> >>> ##==================================================================
> >>> Your problem sees to be omitting the as_label().
> >>>
> >>> You do not need to load "haven"
> >>> read_spss() in sjlabelled should do the trick.
> >>>
> >>>
> >>> On Sat, 8 Feb 2020 at 05:44, Rui Barradas <ruipbarradas using sapo.pt> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> Try
> >>>>
> >>>> aux_fun <- function(x){
> >>>> levels <- attr(x, "labels")
> >>>> factor(x, labels = names(levels), levels = levels)
> >>>> }
> >>>>
> >>>> newCatsDogs <- as.data.frame(lapply(CatsDogs, aux_fun))
> >>>>
> >>>> str(newCatsDogs)
> >>>> #'data.frame': 10 obs. of 3 variables:
> >>>> # $ Animal : Factor w/ 2 levels "Cat","Dog": 1 1 1 1 1 1 1 1 1 1
> >>>> # $ Training: Factor w/ 2 levels "Food as Reward",..: 1 1 1 1 1 1 1 1 1
> >>>> 1
> >>>> # $ Dance : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2
> >>>>
> >>>>
> >>>> As for the
> >>>> - frequencies: ?table, ?tapply, ?aggregate,
> >>>> - barplots: ?barplot
> >>>>
> >>>> You can find lots and lots of examples online of both covering what
> >>>> seems to simple use cases.
> >>>>
> >>>> Hope this helps,
> >>>>
> >>>> Rui Barradas
> >>>>
> >>>> Às 06:03 de 08/02/20, Yawo Kokuvi escreveu:
> >>>> > Thanks for all. Here is output from dput. I used a different dataset
> >>>> > containing categorical variables since the previous one is on a
> >>>> different
> >>>> > computer.
> >>>> >
> >>>> > In the following dataset, my interest is in getting frequencies and
> >>>> > barplots for the two variables: Training and Dance, with value labels
> >>>> > displayed.
> >>>> >
> >>>> > thanks again - cY
> >>>> >
> >>>> >
> >>>> > =========
> >>>> > dput(head(CatsDogs, n = 10))
> >>>> > structure(
> >>>> > list(
> >>>> > Animal = structure(
> >>>> > c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> >>>> > 0),
> >>>> > label = "Animal",
> >>>> > labels = c(Cat = 0, Dog = 1),
> >>>> > class = "haven_labelled"
> >>>> > ),
> >>>> > Training = structure(
> >>>> > c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
> >>>> > label = "Type of Training",
> >>>> > labels = c(`Food as Reward` = 0,
> >>>> > `Affection as Reward` = 1),
> >>>> > class = "haven_labelled"
> >>>> > ),
> >>>> > Dance = structure(
> >>>> > c(1,
> >>>> > 1, 1, 1, 1, 1, 1, 1, 1, 1),
> >>>> > label = "Did they dance?",
> >>>> > labels = c(No = 0,
> >>>> > Yes = 1),
> >>>> > class = "haven_labelled"
> >>>> > )
> >>>> > ),
> >>>> > row.names = c(NA,-10L),
> >>>> > class = c("tbl_df", "tbl", "data.frame")
> >>>> > )
> >>>> >
> >>>> >
> >>>> > On Fri, Feb 7, 2020 at 10:14 PM Bert Gunter <bgunter.4567 using gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> Yes. Most attachments are stripped by the server.
> >>>> >>
> >>>> >> Bert Gunter
> >>>> >>
> >>>> >> "The trouble with having an open mind is that people keep coming
> >>>> along and
> >>>> >> sticking things into it."
> >>>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Feb 7, 2020 at 5:34 PM John Kane <jrkrideau using gmail.com>
> >>>> wrote:
> >>>> >>
> >>>> >>> Hi,
> >>>> >>> Could you upload some sample data in dput form? Something like
> >>>> >>> dput(head(Scratch, n=13)) will give us some real data to examine.
> >>>> Just
> >>>> >>> copy
> >>>> >>> and paste the output of dput(head(Scratch, n=13))into the email.
> >>>> This is
> >>>> >>> the best way to ensure that R-help denizens are getting the data in
> >>>> the
> >>>> >>> exact format that you have.
> >>>> >>>
> >>>> >>> On Fri, 7 Feb 2020 at 15:32, Yawo Kokuvi <yawo1964 using gmail.com>
> >>>> wrote:
> >>>> >>>
> >>>> >>>> Thanks for all your assistance
> >>>> >>>>
> >>>> >>>> Attached please is the Rdata scratch I have been using
> >>>> >>>>
> >>>> >>>> -----------------------------------------------------
> >>>> >>>>
> >>>> >>>>> head(Scratch, n=13)
> >>>> >>>> # A tibble: 13 x 6
> >>>> >>>> ID marital sex race paeduc
> >>>> speduc
> >>>> >>>> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
> >>>> <dbl+lbl>
> >>>> >>>> 1 1 3 [DIVORCED] 1 [MALE] 1 [WHITE] NA
> >>>> NA
> >>>> >>>> 2 2 1 [MARRIED] 1 [MALE] 1 [WHITE] NA
> >>>> NA
> >>>> >>>> 3 3 3 [DIVORCED] 1 [MALE] 1 [WHITE] 4
> >>>> NA
> >>>> >>>> 4 4 4 [SEPARATED] 1 [MALE] 1 [WHITE] 16
> >>>> NA
> >>>> >>>> 5 5 3 [DIVORCED] 1 [MALE] 1 [WHITE] 18
> >>>> NA
> >>>> >>>> 6 6 1 [MARRIED] 2 [FEMALE] 1 [WHITE] 14
> >>>> 20
> >>>> >>>> 7 7 1 [MARRIED] 2 [FEMALE] 2 [BLACK] NA
> >>>> 12
> >>>> >>>> 8 8 1 [MARRIED] 2 [FEMALE] 1 [WHITE] NA
> >>>> 12
> >>>> >>>> 9 9 3 [DIVORCED] 2 [FEMALE] 1 [WHITE] 11
> >>>> NA
> >>>> >>>> 10 10 1 [MARRIED] 2 [FEMALE] 1 [WHITE] 16 12
> >>>> >>>> 11 11 5 [NEVER MARRIED] 2 [FEMALE] 2 [BLACK] NA NA
> >>>> >>>> 12 12 3 [DIVORCED] 2 [FEMALE] 2 [BLACK] NA NA
> >>>> >>>> 13 13 3 [DIVORCED] 2 [FEMALE] 2 [BLACK] 16 NA
> >>>> >>>>
> >>>> >>>> -----------------------------------------------------
> >>>> >>>>
> >>>> >>>> and below is my script/command file.
> >>>> >>>>
> >>>> >>>> *#1: Load library and import SPSS dataset*
> >>>> >>>> library(haven)
> >>>> >>>> Scratch <- read_sav("~/Desktop/Scratch.sav")
> >>>> >>>>
> >>>> >>>> *#2: save the dataset with a name*
> >>>> >>>> save(ScratchImport, file="Scratch.Rdata")
> >>>> >>>>
> >>>> >>>> *#3: install & load necessary packages for descriptive statistics*
> >>>> >>>> install.packages ("freqdist")
> >>>> >>>> library (freqdist)
> >>>> >>>>
> >>>> >>>> install.packages ("sjlabelled")
> >>>> >>>> library (sjlabelled)
> >>>> >>>>
> >>>> >>>> install.packages ("labelled")
> >>>> >>>> library (labelled)
> >>>> >>>>
> >>>> >>>> install.packages ("surveytoolbox")
> >>>> >>>> library (surveytoolbox)
> >>>> >>>>
> >>>> >>>> *#4: Check the value labels of gender and marital status*
> >>>> >>>> Scratch$sex %>% attr('labels')
> >>>> >>>> Scratch$marital %>% attr('labels')
> >>>> >>>>
> >>>> >>>> *#5: Frequency Distribution and BarChart for Categorical/Ordinal
> >>>> Level
> >>>> >>>> Variables such as Gender - SEX*
> >>>> >>>> freqdist(Scratch$sex)
> >>>> >>>> barplot(table(Scratch$marital))
> >>>> >>>>
> >>>> >>>> -----------------------------------------------------
> >>>> >>>>
> >>>> >>>> As you can see from above, I use the <haven> package to import the
> >>>> data
> >>>> >>>> from SPSS. Apparently, the haven function keeps the value labels,
> >>>> as
> >>>> >>> the
> >>>> >>>> attribute options in section #4 of my script shows.
> >>>> >>>> The problem is that when I run frequency distribution for any of
> >>>> the
> >>>> >>>> categorical variables like sex or marital status, only the numbers
> >>>> (1,
> >>>> >>> 2,)
> >>>> >>>> are displayed in the output. The labels (male, female) for
> >>>> example are
> >>>> >>>> not.
> >>>> >>>>
> >>>> >>>> Is there any way to force these to be shown in the output? Is
> >>>> there a
> >>>> >>>> global property that I have to set so that these value labels are
> >>>> >>> reliably
> >>>> >>>> displayed with every output? I read I can declare them as factors
> >>>> using
> >>>> >>>> the <as_factor()>, but once I do so, how do I invoke them in my
> >>>> >>> commands so
> >>>> >>>> that the value labels show...
> >>>> >>>>
> >>>> >>>> Sorry about all the noobs questions, but Ihopefully, I am able to
> >>>> get
> >>>> >>> this
> >>>> >>>> working.
> >>>> >>>>
> >>>> >>>> Thanks in advance.
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> Thanks - cY
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> On Fri, Feb 7, 2020 at 1:14 PM <cpolwart using chemo.org.uk> wrote:
> >>>> >>>>
> >>>> >>>>> I've never used it, but there is a labels function in haven...
> >>>> >>>>>
> >>>> >>>>> On 7 Feb 2020 17:05, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >>>> >>>>>
> >>>> >>>>> What does your data look like after importing? -- see ?head and
> >>>> ?str
> >>>> >>> to
> >>>> >>>>> tell us. Show us the code that failed to provide "labels." See the
> >>>> >>>> posting
> >>>> >>>>> guide below for how to post questions that are likely to elicit
> >>>> >>> helpful
> >>>> >>>>> responses.
> >>>> >>>>>
> >>>> >>>>> I know nothing about the haven package, but see ?factor or go
> >>>> through
> >>>> >>> an
> >>>> >>>> R
> >>>> >>>>> tutorial or two to learn about factors, which may be part of the
> >>>> issue
> >>>> >>>>> here. R *generally* obtains whatever "label" info it needs from
> >>>> the
> >>>> >>>> object
> >>>> >>>>> being tabled -- see ?tabulate, ?table etc. -- if that's what
> >>>> you're
> >>>> >>>> doing.
> >>>> >>>>>
> >>>> >>>>> Bert Gunter
> >>>> >>>>>
> >>>> >>>>> "The trouble with having an open mind is that people keep coming
> >>>> along
> >>>> >>>> and
> >>>> >>>>> sticking things into it."
> >>>> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> On Fri, Feb 7, 2020 at 8:28 AM Yawo Kokuvi <yawo1964 using gmail.com>
> >>>> >>> wrote:
> >>>> >>>>>
> >>>> >>>>>> Hello,
> >>>> >>>>>>
> >>>> >>>>>> I am just transitioning from SPSS to R.
> >>>> >>>>>>
> >>>> >>>>>> I used the haven library to import some of my spss data files to
> >>>> R.
> >>>> >>>>>>
> >>>> >>>>>> However, when I run procedures such as frequencies or crosstabs,
> >>>> >>> value
> >>>> >>>>>> labels for categorical variables such as gender (1=male,
> >>>> 2=female)
> >>>> >>> are
> >>>> >>>>> not
> >>>> >>>>>> shown. The same applies to many other output.
> >>>> >>>>>>
> >>>> >>>>>> I am confused.
> >>>> >>>>>>
> >>>> >>>>>> 1. Is there a global setting that I can use to force all
> >>>> categorical
> >>>> >>>>>> variables to display labels?
> >>>> >>>>>>
> >>>> >>>>>> 2. Or, are these labels to be set for each function or package?
> >>>> >>>>>>
> >>>> >>>>>> 3. How can I request the value labels for each function I run?
> >>>> >>>>>>
> >>>> >>>>>> Thanks in advance for your help..
> >>>> >>>>>>
> >>>> >>>>>> Best, Yawo
> >>>> >>>>>>
> >>>> >>>>>> [[alternative HTML version deleted]]
> >>>> >>>>>>
> >>>> >>>>>> ______________________________________________
> >>>> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>>> see
> >>>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>>>> PLEASE do read the posting guide
> >>>> >>>>>> http://www.R-project.org/posting-guide.html
> >>>> >>>>>> and provide commented, minimal, self-contained, reproducible
> >>>> code.
> >>>> >>>>>>
> >>>> >>>>>
> >>>> >>>>> [[alternative HTML version deleted]]
> >>>> >>>>>
> >>>> >>>>> ______________________________________________
> >>>> >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>>> PLEASE do read the posting guide
> >>>> >>>>> http://www.R-project.org/posting-guide.html
> >>>> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>
> >>>> >>>> [[alternative HTML version deleted]]
> >>>> >>>>
> >>>> >>>> ______________________________________________
> >>>> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>> PLEASE do read the posting guide
> >>>> >>>> http://www.R-project.org/posting-guide.html
> >>>> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>> >>>>
> >>>> >>>
> >>>> >>>
> >>>> >>> --
> >>>> >>> John Kane
> >>>> >>> Kingston ON Canada
> >>>> >>>
> >>>> >>> [[alternative HTML version deleted]]
> >>>> >>>
> >>>> >>> ______________________________________________
> >>>> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>> PLEASE do read the posting guide
> >>>> >>> http://www.R-project.org/posting-guide.html
> >>>> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>> >>>
> >>>> >>
> >>>> >
> >>>> > [[alternative HTML version deleted]]
> >>>> >
> >>>> > ______________________________________________
> >>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> > PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> > and provide commented, minimal, self-contained, reproducible code.
> >>>> >
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>> --
> >>> John Kane
> >>> Kingston ON Canada
> >>>
> >>
> >
> > --
> > John Kane
> > Kingston ON Canada
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list