[R] Approach for Storing Result Data
Bert Gunter
bgunter.4567 at gmail.com
Wed Mar 8 17:44:17 CET 2017
This does not appear to be a legitimate topic for r-help: it is are
not a consulting service. Please see the posting guide.
Of course, others may disagree and reply. Wouldn't be the first time I'm wrong.
-- Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Mar 8, 2017 at 7:27 AM, <G.Maubach at weinwolf.de> wrote:
> Hi All,
>
> today I have a more general question concerning the approach of storing
> different values from the analysis of multiple variables.
>
> My task is to compare distributions in a universe with distributions from
> the respondents using a whole bunch of variables. Comparison shall be done
> on relative frequencies (proportions).
>
> I was thinking about the structure I should store the results in and came
> up with the following:
>
> -- cut --
>
> library(stringi)
>
> # Result data frame
> # Some sort of tidytidy data set where
> # each value is stored as an identity.
> # This way all values for all variables could be stored in
> # one unique data structure.
> # If an additional variable added for the name of the
> # research one could also build result data set across
> # surveys.
> # Values for measure could be "number" for 'raw' values or
> # "freq" for frequencies/counts.
> # Values for unit could be "n" for 'numbers' and
> # "%" for percentages.
> d_test <- data.frame(
> group = rep(c("Universe", "Respondents"), each = 16),
> variable = rep("State", 32),
> value = rep(c(11.3,
> 12.7,
> 3.3,
> 5,
> 0.6,
> 8.1,
> 6.2,
> 5.8,
> 6.4,
> 14.5,
> 8.3,
> 0.3,
> 3.8,
> 2.5,
> 8.1,
> 3), 2),
> label = rep(c("Baden-Wuerttemberg",
> "Bayern",
> "Berlin",
> "Brandenburg",
> "Bremen",
> "Hamburg",
> "Hessen",
> "Mecklenburg-Vorpommern",
> "Niedersachsen",
> "Nordrhein-Westfalen",
> "Rheinland-Pfalz",
> "Saarland",
> "Sachsen",
> "Sachsen-Anhalt",
> "Schleswig-Holstein",
> "Thueringen"),2),
> measure = rep("freq", 32),
> unit = rep("%", 32),
> stringsAsFactors = FALSE
> )
>
> # This way the variables can be selected using simple
> # value selection from Base R functionality.
> data <- d_test[d_test$variable == "State" ,]
>
> # And plot results for every variable.
> ggplot(
> data = data,
> aes(
> x = label,
> y = value,
> fill = group)) +
> geom_bar(stat = "identity", position = "dodge") +
> theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
> scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
> +
> scale_x_discrete(name = data$variable[1]) +
> scale_y_discrete(name = data$unit[1])
>
> -- cut --
>
> The reporting / presentation is done in R Markdown. I would load the
> result data set once at the beginning and running the comparisons as plots
> on each variable named in the results data set under "variable".
>
> If I follow this approach for my customer relationship survey, do think I
> would face drawbacks or run into serious trouble?
>
> I am interested in your opinion and open for other approaches and
> suggestions.
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list