[R] Package(s) for making waffle plot-like figures?

Jim Lemon jim at bitwrit.com.au
Sun Nov 3 04:30:17 CET 2013


On 11/02/2013 10:35 AM, Zhao Jin wrote:
> Dear all,
>
> I am trying to make a series of waffle plot-like figures for my data to
> visualize the ratios of amino acid residues at each position. For each one
> of 37 positions, there may be one to four different amino acid residues. So
> the data consist of the positions, what residues are there, and the ratios
> of residues. The ratios of residues at a position add up to 100, or close
> to 100 (more on this soon)*. I am hoping to make a *square* waffle
> plot-like figure for each position, and fill the 10 X 10 grids with colors
> representing each amino acid residue and areas for grids of a certain color
> corresponding to the ratio of that residue. Then I could line up all the
> plots in one row from position 1 to position 37.
> *: if the sum of the ratios is less than 100 at a position, that's because
> of an unknown residue which I did not include in the table.
>
> I am attaching the dput output for my data here:
> structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
> 8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
> 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
> 26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
> 36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
> 7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
> 14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
> 15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
> 12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E",
> "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V",
> "Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L,
> 100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
> 1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
> 98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
> 100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names =
> c("position",
> "residue", "ratio"), class = "data.frame", row.names = c("1",
> "2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15",
> "17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31",
> "32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44",
> "45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57",
> "58", "59", "60", "61", "62", "63", "64", "65"))
>
> Inspired by a statexchange post, I am using these scripts to make the plots
> :
> library(ggplot2)
> col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
> dflist=list()
> for (i in 1:37){
> residue_num=length(which(df$position==i))
> dflist[[i]]=df[df$position==i,2:3]
> waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num)))
> residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
> waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec)))
> png(paste('plot',i,'.png',sep=''))
> print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color =
> "white") + scale_fill_manual("residue",values = col4) + coord_equal() +
> theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
> + theme(axis.ticks=element_blank()) +
> theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
> theme(axis.title.x=element_blank(),axis.title.y=element_blank())
> )
> dev.off()}
>
> With my scripts, I could make a waffle plot, but not a *square* 10 X 10
> waffle plot. Also, the grid size differs for positions with different
> numbers of residues. I am suspecting that I didn't use coord_equal()
> correctly.
>
> So I wonder how I can make the plots like I described above in ggplot2 or
> with some other packages. Also, is there a way to assign a color to
> different residues, say, purple for alanine, blue for glycine, etc, and
> incorporate that information in the for loop?
>
Hi Zhao,
By beginning with a 10x10 matrix of NA values and then replacing some of 
them with a color, I think you can do what you want. First you need a 
function to fill one corner of your matrix with values, leaving the rest 
uncolored (i.e. NA):

fill.corner<-function(x,nrow,ncol) {
  xlen<-length(x)
  if(nrow*ncol > xlen) {
   newmat<-matrix(NA,nrow=nrow,ncol=ncol)
   xside<-1
   while(xside*xside < xlen) xside<-xside+1
   row=1
   col=1
   for(xindex in 1:xlen) {
    newmat[row,col]<-x[xindex]
    if(row == xside) {
     col<-col+1
     row<-1
    }
    else row<-row+1
   }
   return(newmat)
  }
  cat("Too many values in x for",xrow,"by",xcol,"\n")
}

Then you have to massage your data frame into 37 smaller data frames, 
create matrices with the values and colors to display on your 37 waffle 
plots:

library(plotrix)
# get an "alphabet" of colors
alphacol<-rainbow(18)
# the actual values in the plotted matrix don't matter
fakemat<-matrix(1:100,nrow=10)
# pick off the positions one by one
for(pos in 1:37) {
  posdf<-zjdat[zjdat$position == pos,]
  for(res in 1:dim(posdf)[1]) {
   if(res == 1)
    rescol<-rep(alphacol[as.numeric(posdf$residue[res])],
    posdf$ratio[res])
   else
    rescol<-c(rescol,rep(alphacol[as.numeric(posdf$residue[res])],
    posdf$ratio[res]))
  }
  if(!is.null(resmat<-fill.corner(rescol,10,10)))
   color2D.matplot(fakemat,border="lightgray",cellcolors=resmat,
    yrev=FALSE,main=c(pos,length(resmat)))
}

That might get you started. In fact, I might even write a waffle plot 
function for plotrix.

Jim



More information about the R-help mailing list