[R] Approach for Storing Result Data

Wed Mar 8 17:44:17 CET 2017

This does not appear to be a legitimate topic for r-help: it is are
not a consulting service. Please see the posting guide.

Of course, others may disagree and reply. Wouldn't be the first time I'm wrong.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Mar 8, 2017 at 7:27 AM,  <G.Maubach at weinwolf.de> wrote:
> Hi All,
>
> today I have a more general question concerning the approach of storing
> different values from the analysis of multiple variables.
>
> My task is to compare distributions in a universe with distributions from
> the respondents using a whole bunch of variables. Comparison shall be done
> on relative frequencies (proportions).
>
> I was thinking about the structure I should store the results in and came
> up with the following:
>
> -- cut --
>
> library(stringi)
>
> # Result data frame
> # Some sort of tidytidy data set where
> # each value is stored as an identity.
> # This way all values for all variables could be stored in
> # one unique data structure.
> # If an additional variable added for the name of the
> # research one could also build result data set across
> # surveys.
> # Values for measure could be "number" for 'raw' values or
> # "freq" for frequencies/counts.
> # Values for unit could be "n" for 'numbers' and
> # "%" for percentages.
> d_test <- data.frame(
>     group = rep(c("Universe", "Respondents"), each = 16),
>     variable = rep("State", 32),
>     value = rep(c(11.3,
>                     12.7,
>                     3.3,
>                     5,
>                     0.6,
>                     8.1,
>                     6.2,
>                     5.8,
>                     6.4,
>                     14.5,
>                     8.3,
>                     0.3,
>                     3.8,
>                     2.5,
>                     8.1,
>                     3), 2),
>     label = rep(c("Baden-Wuerttemberg",
>                 "Bayern",
>                 "Berlin",
>                 "Brandenburg",
>                 "Bremen",
>                 "Hamburg",
>                 "Hessen",
>                 "Mecklenburg-Vorpommern",
>                 "Niedersachsen",
>                 "Nordrhein-Westfalen",
>                 "Rheinland-Pfalz",
>                 "Saarland",
>                 "Sachsen",
>                 "Sachsen-Anhalt",
>                 "Schleswig-Holstein",
>                 "Thueringen"),2),
>     measure = rep("freq", 32),
>     unit = rep("%", 32),
>     stringsAsFactors = FALSE
> )
>
> # This way the variables can be selected using simple
> # value selection from Base R functionality.
> data <- d_test[d_test$variable == "State" ,]
>
> # And plot results for every variable.
> ggplot(
>   data = data,
>   aes(
>     x = label,
>     y = value,
>     fill = group)) +
>   geom_bar(stat = "identity", position = "dodge") +
>   theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
>   scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
> +
>   scale_x_discrete(name = data$variable[1]) +
>   scale_y_discrete(name = data$unit[1])
>
> -- cut --
>
> The reporting / presentation is done in R Markdown. I would load the
> result data set once at the beginning and running the comparisons as plots
> on each variable named in the results data set under "variable".
>
> If I follow this approach for my customer relationship survey, do think I
> would face drawbacks or run into serious trouble?
>
> I am interested in your opinion and open for other approaches and
> suggestions.
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.