[R] Error bars and CI
Mohan.Radhakrishnan at cognizant.com
Mohan.Radhakrishnan at cognizant.com
Thu Jun 18 11:43:12 CEST 2015
Hi Dennis,
I have copied the 'r' group. Could you explain ? Why can't we compute CI and error bars using this data set ?
The graph generated has equal-sized error bars and a 99% confidence band. Groups are not needed here. But the error bar and CI calculations could be incorrect but I am able to draw this.
V1 IDX
1 0.796 1
2 0.542 2
3 0.510 3
4 0.617 4
5 0.482 5
6 0.387 6
7 0.272 7
8 0.536 8
9 0.498 9
10 0.402 10
11 0.328 11
12 0.542 12
13 0.299 13
14 0.647 14
15 0.291 15
16 0.815 16
17 0.680 17
18 0.363 18
19 0.560 19
20 0.334 20
Assume the dataframe is 'jc'.
print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error
print(error1)
print(error2)
q <- qplot(geom = "line",jc$IDX,jc$V1, colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), ymax=jc$V1+sd(jc$V1)), width=0.25)+
geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+
xlab('Iterations') + ylab("Java Collections")+theme_bw()
Thanks,
Mohan
-----Original Message-----
From: Dennis Murphy [mailto:djmuser at gmail.com]
Sent: Wednesday, June 17, 2015 8:42 PM
To: Radhakrishnan, Mohan (Cognizant)
Subject: Re: [R] Error bars and CI
Q: How do you expect to get error bars when you plot "groups" having samples of size 1? If you "are not grouping", then what is the point of trying to manufacture variation where none exists? I'd suggest you think a little more deeply about what you can achieve with the available data.
This plot visualizes the data you posted. Every point is accounted for. I named the input data frame DF.
ggplot(DF, aes(x = IDX, y = V1)) +
geom_line() + geom_point()
If you don't have replicate data at each unique x-value you want to plot, you cannot legitimately plot error bars, confidence intervals or any other visual that describes a (summary of) a distribution. If the values of V1 are supposed to represent averages that come from other data set, then you should have a corresponding column of standard deviations/standard errors, and *then* you can plot error bars, CIs, etc. Without a legitimate measure of variation in your input data frame, I don't see how you can possibly generate a line graph with accompanying error bars/CIs.
Dennis
On Wed, Jun 17, 2015 at 1:13 AM, <Mohan.Radhakrishnan at cognizant.com> wrote:
> I think it could be something like this. But the mean is for the entire set. Not groups.
> I get a graph with this code but error bars are not there.
>
>
> p<-ggplot(jc,aes(IDX,V1,colour=V1))
> p <- p + stat_summary(fun.y=mean,geom="point")
> p <- p + stat_summary(fun.y=mean,geom="line")
> p <- p + stat_summary(fun.data=mean_cl_normal,conf.int = .99,
> geom="errorbar", width=0.2)
>
>
> Thanks,
> Mohan
>
> -----Original Message-----
> From: Radhakrishnan, Mohan (Cognizant)
> Sent: Wednesday, June 17, 2015 12:54 PM
> To: 'Dennis Murphy'
> Cc: r-help at r-project.org
> Subject: RE: [R] Error bars and CI
>
> Your sample code is working. But I am missing the logic when my dataset is involved.
>
> My full dataset is this. It is the V1 column I am interested in. I am not 'grouping' here.
>
> V1 IDX
> 1 0.796 1
> 2 0.542 2
> 3 0.510 3
> 4 0.617 4
> 5 0.482 5
> 6 0.387 6
> 7 0.272 7
> 8 0.536 8
> 9 0.498 9
> 10 0.402 10
> 11 0.328 11
> 12 0.542 12
> 13 0.299 13
> 14 0.647 14
> 15 0.291 15
> 16 0.815 16
> 17 0.680 17
> 18 0.363 18
> 19 0.560 19
> 20 0.334 20
>
> Thanks,
> Mohan
>
> -----Original Message-----
> From: Dennis Murphy [mailto:djmuser at gmail.com]
> Sent: Tuesday, June 16, 2015 1:18 AM
> To: Radhakrishnan, Mohan (Cognizant)
> Subject: Re: [R] Error bars and CI
>
> Hi:
>
> Firstly, your dplyr code to generate the summary data frame is unnecessary and distracting, particularly since you didn't provide the input data set; you are asked to provide a *minimal* reproducible example, which you could easily have done with a built-in data set.
> That said, to get what I perceive you want, I used the InsectSprays data from the autoloaded datasets package.
>
> # Function to compute standard error of a mean sem <- function(x)
> sqrt(var(x)/length(x))
>
> ## Use insectSprays data for illustration ## Compute mean and SE of
> count for each level of spray
>
> library(dplyr)
> library(ggplot2)
>
> insectSumm <- InsectSprays %>%
> group_by(spray) %>%
> summarise(mean = mean(count), se = sem(count))
>
>
> # Since the x-variable is a factor, need to map group = 1 to # draw lines between factor levels. geom_pointrange() can be # used to produce the 99% CIs per factor level, geom_errorbar() # for the mean +/- SE. I ordered the geoms so that the errorbar # is last, but if you want it (mostly) overwritten, put the # geom_pointrange() call last.
>
> ggplot(insectSumm, aes(x = spray, y = mean)) +
> theme_bw() +
> geom_line(aes(group = 1), size = 1, color = "darkorange") +
> geom_pointrange(aes(ymin = mean - qt(.995, 11) * se,
> ymax = mean + qt(.995, 11) * se),
> size = 1.5, color = "firebrick") +
> geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2,
> size = 1)
>
> Clearly, you can pipe all the way through the ggplot() call, but I wanted to check the contents of the summary data frame first.
>
> Dennis
>
> On Mon, Jun 15, 2015 at 3:51 AM, <Mohan.Radhakrishnan at cognizant.com> wrote:
>> Hi,
>>
>> I want to plot a line graph using this data. IDX is x-axis and V1 is y-axis. I also want standard error bars and 99% CI to be shown. My code is given below. The section that plots the graph is the problem. I don't see all the points in the line graph with error bars. How can I also show the 99% CI in the graph ?
>>
>> V1 IDX
>> 1 0.987 21
>> 2 0.585 22
>> 3 0.770 23
>> 4 0.711 24
>>
>> library(stringr)
>> library(dplyr)
>> library(ggplot2)
>>
>> data <- read.table("D:\\jmh\\jmh.txt",sep="\t")
>>
>> final <-data %>%
>> select(V1) %>%
>> filter(grepl("^Iteration", V1)) %>%
>> mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))
>>
>> final <- mutate(final,IDX = 1:n())
>>
>> jc <- final %>%
>> filter(IDX < 21)
>>
>>
>> #Convert to numeric
>> jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x))))
>>
>> print(jc)
>>
>> # The following section is the problem.
>>
>> sem <- function(x){
>> sd(x)/sqrt(length(x))
>> }
>>
>> meanvalue <- apply(jc,2,mean)
>> semvalue <- apply(jc, 2, sem)
>>
>> mean_sem <- data.frame(mean= meanvalue, sem= semvalue,
>> group=names(jc))
>>
>> #larger font
>> theme_set(theme_gray(base_size = 20))
>>
>> #plot using ggplot
>> p <- ggplot(mean_sem, aes(x=group, y=mean)) +
>> geom_line(stat='identity') +
>> geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
>> width=.2)
>> print(p)
>>
>> Thanks,
>> Mohan
>> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
More information about the R-help
mailing list