[R] about ECDF display in ggplot2

Bogdan Tanasa t@n@@@ @ending from gm@il@com
Mon Jul 9 02:44:45 CEST 2018


Dear Jeff,

thank you for your email.

Yes, in order to be more descriptive/comprehensive, please find attached to
my email the following files (my apologies ... I am sending these as
attachments, as I do not have a web server running at this moment) :

-- the R script (R_script_display_ECDF.R) that reads the file "LENGTH" and
outputs ECDF figure by using the standard R function or ggplot2.

-- the display of ECDF by using standard R function
("display.R.ecdf.LENGTH.pdf")

-- the display of ECDF by using ggplot2 ("display.ggplot2.ecdf.LENGTH.pdf")

The ECDF over xlim(0,500) looks very different (contrasting plot(ecdf) vs
ggplot2).  Please would you advise why ? what shall I change in my ggplot2
code ?

thanks a lot,

- bogdan

ps : the R code is also written below :

 library("ggplot2")
>


> file <- read.delim("LENGTH", sep="\t", header=T, stringsAsFactors=F)
>


> ############################# display with PLOT FUNCTION:
>


> pdf("display.R.ecdf.LENGTH.pdf", width=10, height=6, paper='special')
>


> plot(ecdf(file$LENGTH), xlab="DEL SIZE",
>                      ylab="fraction of DEL",
>                      main="LENGTH of DEL",
>                      xlim=c(0,500),
>                      col = "dark red", axes = FALSE)
>


> ticks_y <- c(0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4)
>


> axis(2, at=ticks_y, labels=ticks_y, col.axis="red")
>


> ticks_x <- c(0, 100, 200, 400, 500, 600, 700, 800)
>


> axis(1, at=ticks_x, labels=ticks_x, col.axis="blue")
>


> dev.off()
>


> ############################# display in GGPLOT2 :
>


> BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500,
>            1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)
>


> barfill <- "#4271AE"
> barlines <- "#1F3552"
>


> pdf("display.ggplot2.ecdf.LENGTH.pdf", width=10, height=6,
> paper='special')
>


> ggplot(file, aes(LENGTH)) +
>           stat_ecdf(geom = "point", colour = barlines, fill = barfill) +
>           scale_x_continuous(name = "LENGTH of DEL",
>                              breaks = BREAKS,
>                              limits=c(0, 500)) +
>           scale_y_continuous(name = "FRACTION") +
>           ggtitle("ECDF of LENGTH") +
>           theme_bw() +
>           theme(legend.position = "bottom", legend.direction =
> "horizontal",
>                legend.box = "horizontal",
>                legend.key.size = unit(1, "cm"),
>                axis.title = element_text(size = 12),
>                legend.text = element_text(size = 9),
>                legend.title=element_text(face = "bold", size = 9))
>


> dev.off()








On Sat, Jul 7, 2018 at 9:47 PM, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:

> It is a feature of ggplot that points excluded by limits raise warnings,
> while base graphics do not.
>
> You may find that using coord_cartesian with the xlim=c(0,500) argument
> works better with ggplot by showing the consequences of points out of the
> limits on lines within the viewport.
>
> There are other possible problems with your data that your
> non-reproducible example does not show, and sending R code in
> HTML-formatted email usually corrupts it.. so please follow the
> recommendations in the Posting Guide next time you post.
>
> On July 6, 2018 4:32:41 PM PDT, Bogdan Tanasa <tanasa using gmail.com> wrote:
> >Dear all,
> >
> >I would appreciate having your advice/suggestions/comments on the
> >following
> >:
> >
> >1 -- starting from a vector that contains LENGTHS (numerically, the
> >values
> >are from 1 to 10 000)
> >
> >2 -- shall I display the ECDF by using the R code and some "limits" :
> >
> >BREAKS = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400,
> >500,
> >         1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000)
> >
> >ggplot(x, aes(LENGTH)) +
> >          stat_ecdf(geom = "point") +
> >          scale_x_continuous(name = "LENGTH of DEL",
> >                             breaks = BREAKS,
> >                             limits=c(0, 500))
> >
> >3 -- I am getting the following warning message : "Warning message:
> >Removed
> >109 rows containing non-finite values (stat_ecdf)."
> >
> >The question is : are these 109 values removed from VISUALIZATION as i
> >set
> >up the "limits", or are these 109 values removed from statistical
> >CALCULATION?
> >
> >4 -- in contrast, shall I use the standard R functions plot(ecdf),
> >there is
> >no "warning mesage"
> >
> >plot(ecdf(x$LENGTH), xlab="DEL LENGTH",
> >                     ylab="Fraction of DEL", main="DEL", xlim=c(0,500),
> >                     col = "dark red")
> >
> >Thanks a lot !
> >
> >-- bogdan
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: display.ggplot2.ecdf.LENGTH.pdf
Type: application/pdf
Size: 8841 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180708/75da9c56/attachment.pdf>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: display.R.ecdf.LENGTH.pdf
Type: application/pdf
Size: 13600 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180708/75da9c56/attachment-0001.pdf>


More information about the R-help mailing list