[R] Approach for Storing Result Data
G.Maubach at weinwolf.de
G.Maubach at weinwolf.de
Wed Mar 8 16:27:08 CET 2017
Hi All,
today I have a more general question concerning the approach of storing
different values from the analysis of multiple variables.
My task is to compare distributions in a universe with distributions from
the respondents using a whole bunch of variables. Comparison shall be done
on relative frequencies (proportions).
I was thinking about the structure I should store the results in and came
up with the following:
-- cut --
library(stringi)
# Result data frame
# Some sort of tidytidy data set where
# each value is stored as an identity.
# This way all values for all variables could be stored in
# one unique data structure.
# If an additional variable added for the name of the
# research one could also build result data set across
# surveys.
# Values for measure could be "number" for 'raw' values or
# "freq" for frequencies/counts.
# Values for unit could be "n" for 'numbers' and
# "%" for percentages.
d_test <- data.frame(
group = rep(c("Universe", "Respondents"), each = 16),
variable = rep("State", 32),
value = rep(c(11.3,
12.7,
3.3,
5,
0.6,
8.1,
6.2,
5.8,
6.4,
14.5,
8.3,
0.3,
3.8,
2.5,
8.1,
3), 2),
label = rep(c("Baden-Wuerttemberg",
"Bayern",
"Berlin",
"Brandenburg",
"Bremen",
"Hamburg",
"Hessen",
"Mecklenburg-Vorpommern",
"Niedersachsen",
"Nordrhein-Westfalen",
"Rheinland-Pfalz",
"Saarland",
"Sachsen",
"Sachsen-Anhalt",
"Schleswig-Holstein",
"Thueringen"),2),
measure = rep("freq", 32),
unit = rep("%", 32),
stringsAsFactors = FALSE
)
# This way the variables can be selected using simple
# value selection from Base R functionality.
data <- d_test[d_test$variable == "State" ,]
# And plot results for every variable.
ggplot(
data = data,
aes(
x = label,
y = value,
fill = group)) +
geom_bar(stat = "identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
+
scale_x_discrete(name = data$variable[1]) +
scale_y_discrete(name = data$unit[1])
-- cut --
The reporting / presentation is done in R Markdown. I would load the
result data set once at the beginning and running the comparisons as plots
on each variable named in the results data set under "variable".
If I follow this approach for my customer relationship survey, do think I
would face drawbacks or run into serious trouble?
I am interested in your opinion and open for other approaches and
suggestions.
Kind regards
Georg
More information about the R-help
mailing list