[R] Loop with ggplot2 not as simple as it seems...
Patricia Seo
pseo at stanford.edu
Tue Oct 28 01:58:45 CET 2014
Hi everyone,
I have been battling with this problem for the past month and reading all that I can about it, but I just can't seem to understand what I'm doing wrong. It seems easy and I can replicate others well-recorded attempts, but can not seem to apply this to my data.
I have created a data.frame to easily illustrate my problem. I would like to plot columns 5:7 in my data without having to write-out-every-single-variable name. In this example it is only three columns but in my actual data set it is more like 50 columns. Here is the simplified data frame:
sid <- c(1001:1010) #student id
age <- c(10, 12, 14, 15, 13, 16, 14, 12, 14, 10)
race <- race <- c("w", "b", "a", "w", "a", "w", "b", "a", "w", "a")
gender <- gender <-c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
read <- rnorm(10, 100, 50)
write <- rnorm(10, 100, 50)
math<-rnorm(10, 100, 50)
scores <- data.frame(sid, age, race, gender, read, write, math)
My end goal is to produce a .png like this for every column:
ggplot(scores, aes(x=scores$read, fill=scores$gender)) + geom_density(alpha=.3)
In other words, I would like to write a loop that would produce separate .pngs for each column. No facet-wrap. I would like one plot showing the distribution of one score for male and female in one .png file.
Failed Attempt #1 of 3: My first attempt was to melt data since I know ggplot2 likes melted data. I have tried melting the data using sid, age, race and gender as the id variable. But I’m not sure how to plot what I would like above with just using values. In my actual data set there are over 50 columns to plot (instead of the simple three in the example scores data). I tried creating:
xobject <- subset(scores.melt, variable=read)
but it gives me the error: Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous
Error: Aesthetics must either be length one, or the same length as the dataProblems:test1
Failed Attempt #2 of 3: My second attempt is below but it had similar problems to what others have mentioned with ggplot changing the names of each plot but keeping the last known data column so all the graphs look the same. It works in terms of titles but not data! And I have tried as_string() but to no avail:
for (i in 5:7) {
column_to_plot = as.character(paste("Col_", i, sep=""))
png(paste0("Graph", column_to_plot,".png"))
ggplot(scores, aes(x=column_to_plot, fill=gender )) + geom_density(alpha=.3)
ggsave(paste0("Graph", column_to_plot,".png"))
}
Failed Attempt #3 of 3: My third attempt is my most successful attempt. It is because I created a vector that includes the list of variables I would like to plot. Unfortunately, I have two problems with this method: (1) Don’t know how to include the corresponding title name so I know which plot corresponds to the score even though I do indicate the names of the Indexes, and (2) It is annoying to have to write the entire list of variables I want to plot especially if there are 50 variable names. I know I can specify columns 5:7 (like above) but I’m not grasping the logics of this loop:
Indexes = list()
Indexes[[1]] = scores$read
Indexes[[2]] = scores$write
Indexes[[3]] = scores$math
names(Indexes) = c(“Scores for Read“,
“Scores for Write”, “Scores for Math“)
##### for some reason, I have to run the top half and after it has processed that, run the bottom half#####
for(i in seq(along = Indexes)) {
Index = Indexes[[i]]
for(j in 1:length(gender)) {
png(paste0("Graph", Index,".png"))
ggplot(scores, aes(x=Index, fill = gender)) + geom_density(alpha=.3)
ggsave(paste0("Graph", Index,".png"))
}
}
Any help would be much appreciated. I know this is a frankenstein from previous questions and problems with loops in ggplot2, but just understand what I'm doing wrong would even be a huge help.
More information about the R-help
mailing list