[R] Problems with Boxplot

Petr PIKAL petr.pikal at precheza.cz
Fri Sep 4 12:42:52 CEST 2009


Hi

it is rather difficult to understand what you mean by your 
questions/answers without real reproducible code.

r-help-bounces at r-project.org napsal dne 03.09.2009 13:41:11:

> 
> I'm posting answers to my own Q's here - as far as I have answers - 
first so
> that people don't spend time on them, and second in case the solutions 
are
> helpful to anyone else in future.
> 
> 1) My first question is: is there a simple way of getting both dates 
along
> the x-axis and the "*100" calculation (or percentages)?
> I still don't know how to change the format of the y-axis tick labels. 
I'd
> be interested if anyone has a quick way to get percentages and 
additionally,
> how do I get numbers in the "0,000" format along the x or y-axis?  In 
the
> meantime, I can live with this.

plot(1:10,1:10, axes=F)
axis(2, at=c(2,3,7,9), labels=c(1.2, 2.38, 13.54, 16.8))

the same applies with boxplot.

by

bbb<- boxplot(....)

you obtain an object which is used by bxp. See help page for boxplot, 
section See also


> 
> 2) Next is how can I put a legend somewhere to show that red is "data 
set 1"
> and blue is "data set 2".
> I did this with the following text:
> legend("top", c("Top","Bottom"), cex=1.5, lty=1:2, fill=c("lightblue",
> "salmon"), bty="n")

You can go through structure of object produced by boxplot and you will 
see that boxes are located on x axis from 1 to number of boxes and on y 
axis according to the scale of y axis

boxplot(rnorm(20), axes=F)
legend(.5,0, legend=letters[1:3], col=1:3, pch=1)
legend(1,0, legend=letters[1:3], col=1:3, pch=19, pt.cex=3)



> 
> 3) Is it possible to get the date to straddle across each of the two 
dates
> it covers: as it is, one tick has the date, the other does not.
> I didn't manage to do this, but as there were over 20 dates in the final
> data (i.e. 40 plots), by changing the width of the chart window, not 
every
> plot was labeled anyway and it was clear enough.

??????????

> 
> 4) Is it possible to show both the median and the mean with boxplot?
> I gave up on this, but I think the data looks OK in the end with just 
the
> boxplot defaults.

Again object produced by bbb is your clue

x<-rlnorm(200)
bbb<-boxplot(x)
points(1, mean(x), cex=3,col=2, pch=19)

You can add anything not just mean but remember that when you see boxplot, 
you expect to have median mentioned not mean.

See also par for graphic options, format and or sprintf for formating 
numbers 

Regards
Petr


> 
> 5) Finally, the code works as described above (i.e. up to a point) with 
the
> "Post trial data.csv" file I have posted.  However when I try with a 
larger
> file ("Larger trial.csv", also posted), I get the message: "Error in
> scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line
> 145 did not have 50 elements" when I get to the "data_headings" line.  I
> have no idea why R is seeing a difference between these two files.
> I ended up finding that even for specific small files, I got this error
> message, which prevented me from processing the data and so was fatal to 
the
> code.  I narrowed it down to a small file, and then looked at the csv 
file
> in notepad.  The bottom of the file (which was just 2 columns of data, 
of
> different column lengths), was along these lines:
> 
> -0.48013245,0.095652174
> -0.039344262,-0.067142857
> 0.018022077,-0.079295154
> -0.078534031,
> 0.010054845,
> 0.096153846,
> 0.177568018
> 0.013818182
> 0.002402883
> 
> It seemed that R could cope with empty columns - as long as there was a 
","
> to indicate that there was indeed a column, but it could NOT cope with a
> column that didn't exist (because there was no ",").  The problem was 
that
> Excel, which was generating the CSV file, wasn't putting "," to indicate
> empty columns in certain circumstances.  The solution was to fill the 
empty
> cells in Excel with "na" before saving as CSV.  Excel then saves it
> correctly, and R deals with it correctly. 
> 
> The final code (though without the y-axis formatting being fixed) is:
> 
> testdata<- c("C:\\Files\\R\\Sample R code\\Post trial data.csv")
> new_data<- read.table(testdata, skip = 0, sep = ",", na.strings =
> "na",header = TRUE)
> x11(width=16, height=7, pointsize=14)
> boxplot(new_data,outline = FALSE, col = c("lightblue", "salmon"), las 
=1,
> boxwex = 0.5) 
> legend("top", c("Label for blue boxes","Label for red boxes"), cex=1.5,
> lty=1:2, fill=c("lightblue", "salmon"), bty="n");
> title(main="Chart title text", cex.main = 1.8)
> grid() 
> 
> Guy
> 
> 
> gug wrote:
> > 
> > Hello,
> > 
> > I have been having difficulty getting boxplot to give the output I 
want -
> > probably a result of the way I have been handling the data.
> > 
> > The data is arranged in columns: each date has two sets of data.  The
> > number of data points varies with the date, so each column is of 
different
> > length.  I want to get a series of boxplots with the date along the
> > x-axis, with alternating colors, so that it is easy to see the 
difference
> > between the results within each date, as well as across dates.
> > 
> > testdata<- c("C:\\Files\\R\\Sample R code\\Post trial data.csv")
> > data_headings <- read.table(testdata, skip = 0, sep = ",", header =
> > FALSE)[1,]
> > my_data <- read.table(testdata, skip = 1, sep = ",", na.strings =
> > "na",header = FALSE)
> > boxplot(my_data*100, names = data_headings, outline = FALSE, range = 
0.3,
> > border = c(2,4))
> > 
> > The result is a boxplot, but it does not show the date along the 
bottom
> > (the "names = data_headings" bit achieves nothing).  I can 
alternatively
> > try this:
> > 
> > new_data<- read.table(testdata, skip = 0, sep = ",", na.strings =
> > "na",header = TRUE)
> > boxplot(new_data,outline = FALSE, range = 0.3,border = c(2,4))
> > 
> > This takes all the data and plots it, but I then lose the ability to
> > multiply by 100 (I'm trying to show percentages: e.g. 10% as "10", 
rather
> > than as "0.1").
> > 
> > 1) My first question is: is there a simple way of getting both dates 
along
> > the x-axis and the "*100" calculation (or percentages)?
> > 
> > 2) Next is how can I put a legend somewhere to show that red is "data 
set
> > 1" and blue is "data set 2".
> > 
> > 3) Is it possible to get the date to straddle across each of the two 
dates
> > it covers: as it is, one tick has the date, the other does not.
> > 
> > 4) Is it possible to show both the median and the mean with boxplot?
> > 
> > 5) Finally, the code works as described above (i.e. up to a point) 
with
> > the "Post trial data.csv" file I have posted.  However when I try with 
a
> > larger file ("Larger trial.csv", also posted), I get the message: 
"Error
> > in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : 

> > line 145 did not have 50 elements" when I get to the "data_headings" 
line. 
> > I have no idea why R is seeing a difference between these two files.
> >  http://www.nabble.com/file/p25256461/Post%2Btrial%2Bdata.csv
> > Post+trial+data.csv 
> > http://www.nabble.com/file/p25256461/Larger%2Btrial.csv 
Larger+trial.csv 
> > Thanks for any suggestions,
> > 
> > Guy Green
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
http://www.nabble.com/Problems-with-Boxplot-
> tp25256461p25274286.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list