[R] Fw: volcano plot.r

Brian Diggs diggsb at ohsu.edu
Tue Jul 5 18:58:12 CEST 2011


See inline below,

On 7/4/2011 8:00 PM, Ungku Akashah wrote:
> Hello.
>
> My name is Akashah. i work at metabolic laboratory. From my study, i
> found that volcano plot can help a lot in my section. i already
> studied about the volcano plot and get the coding to run in R
> software, unfortunately, there is may be something wrong with the
> coding. This is because no graph appear, but no error (blue color
> text) was shown on the R console. Below is the coding for volcano
> plot, i hope anybody can help me to solve the problem.
>
>
>
> #    volcano_plot.r
> #
> #    Author:    Amsha Nahid, Jairus Bowne, Gerard Murray
> #    Purpose:    Produces a volcano plot
> #
> #    Input:    Data matrix as specified in Data-matrix-format.pdf
> #    Output:    Plots log2(fold change) vs log10(t-test P-value)
> #
> #    Notes:    Group value for control must be alphanumerically first
> #              Script will return an error if there are more than 2 groups
>
> #
> #    Load the data matrix
> #
> # Read in the .csv file
> data<-read.csv("file:///Users/nadya/Desktop/praktikal UTM/TASKS1/RT BE EMS 300-399.csv", sep=",", row.names=1, header=TRUE)

We don't have your example file, so it is hard to say what is wrong

> # Get groups information
> groups<-data[,1]
> # Get levels for groups
> grp_levs<-levels(groups)
> if (length(levels(groups))>  2)
>      print("Number of groups is greater than 2!") else {
>
>      #
>      #    Split the matrix by group
>      #
>      new_mats<-c()
>      for (ii in 1:length(grp_levs))
>          new_mats[ii]<-list(data[which(groups==levels(groups)[ii]),])
>
>      #
>      #    Calculate the means
>      #
>      # For each matrix, calculate the averages per column
>      submeans<-c()
>      # Preallocate a matrix for the means
>      means<-matrix(
>          nrow = 2,
>          ncol = length(colnames(data[,-1])),
>          dimnames = list(grp_levs,colnames(data[,-1]))
>          )
>      # Calculate the means for each variable per sample
>      for (ii in 1:length(new_mats))
>          {submeans[ii]<-list(apply(new_mats[[ii]][,-1],2,mean,na.rm=TRUE))
>          means[ii,]<-submeans[[ii]]}
>
>      #
>      #    Calculate the fold change
>      #
>      folds<-matrix(
>          nrow=length(means[,1]),
>          ncol=length(means[1,]),
>          dimnames=list(rownames(means),colnames(means))
>          )
>      for (ii in 1:length(means[,1]))
>          for (jj in 1:length(means[1,]))
>              folds[ii,jj]<-means[ii,jj]/means[1,jj]
>
>      #
>      #    t-test P value data
>      #
>      pvals<-matrix(nrow=ncol(data[,-1]),ncol=1,dimnames=list(colnames(data[-1]),"P-Value"))
>
>      #
>      #    Perform the t-Test
>      #
>      for(ii in 1:nrow(pvals)) {
>          pvals[ii,1]<-t.test(new_mats[[1]][,ii+1],new_mats[[2]][,ii+1])$p.value
>          }

Everything up to here is just to process the data into the format you 
want to plot it in.  If you provided the data at this stage (folds and 
pvals), then it would be easier to determine what is going on.  I 
created some dummy data from which to start at this point.

folds <- rbind(0,exp(rnorm(500)))
pvals <- runif(500)

>      m<-length(pvals)
>      x_range<-range(c(
>          min(
>              range(log2(folds[2,])),
>              range(c(-1.5,1.5))
>              ),
>          max(
>              range(log2(folds[2,])),
>              range(c(-1.5,1.5))
>              )
>          ))
>      y_range<-range(c(
>          min(range(-log10(pvals)),
>              range(c(0,2))
>              ),
>          max(range(-log10(pvals)),
>              range(c(0,2))
>              )
>          ))
>
>      #
>      #    Plot data
>      #
>      # Define a function, since it's rather involved
>      volcano_plot<-function(fold, pval)
>          {plot(x_range,                                 # x-dim
>              y_range,                                   # y-dim
>              type="n",                                  # empty plot
>              xlab="log2 Fold Change",                   # x-axis title
>              ylab="-log10 t-Test P-value",              # y-axis title
>              main="Volcano Plot",                       # plot title
>              )
>              abline(h=-log10(0.05),col="green",lty="44")# horizontal line at P=0.05
>              abline(v=c(-1,1),col="violet",lty="1343")  # vertical lines at 2-fold
>              # Plot points based on their values:
>              for (ii in 1:m)
>                  # If it's below 0.05, we're not overly interested: purple.
>                  if (-log10(pvals[ii])>(-log10(0.05))) {
>                      # Otherwise, more checks;
>                      # if it's greater than 2-fold decrease: blue
>                      if (log2(folds[2,][ii])>(-1)) {
>                          # If it's significant but didn't change much: orange
>                          if (log2(folds[2,][ii])<1) {
>                              points(
>                                  log2(folds[2,][ii]),
>                                  -log10(pvals[ii]),
>                                  col="orange",
>                                  pch=20
>                                  )
>                              # Otherwise, greater than 2-fold increase: red
>                              } else {
>                                  points(
>                                      log2(folds[2,][ii]),
>                                      -log10(pvals[ii]),
>                                      col="red",
>                                      pch=20
>                                      )
>                              }
>                          } else {
>                              points(
>                                  log2(folds[2,][ii]),
>                                  -log10(pvals[ii]),
>                                  col="blue",
>                                  pch=20
>                                  )
>                              }
>                      } else {
>                          points(
>                              log2(folds[2,][ii]),
>                              -log10(pvals[ii]),
>                              col="purple",
>                              pch=20
>                              )
>                      }
>          }
>      # Plot onscreen via function
>      x11()
>      volcano_plot(folds,pvals)

Running the above line gives me a plot.  So maybe the problem is 
specific to your data.  Without it, no one can help you further.  Look 
at folds and pvals right before calling volcano_plot; are there values 
you didn't expect?  Do the same with log2(folds) and -log10(pvals).

And if you do give the data, make it minimal.  That is, strip it down to 
just the parts needed to make the volcano plot.  Also, you can rearrange 
your code to compute x_range, y_range, and m inside volcano_plot so that 
the function does not depend on any variables other than the ones passed 
(folds and pvals).  Another suggestion to simplify your code is to 
compute the colors using vector operations, making a single vector of 
colors corresponding to each point, and thus needing only a single 
points call. (You can start by replacing all your points calls with 
color[ii] <- "color name", and then have a single points call outside 
all those ifs):

volcano_plot<-function(fold, pval) {
     m<-length(pvals)
     x_range <- range(c(log2(folds[2,]), -1.5, 1.5))
     y_range <- range(c(-log10(pvals), 0, 2))

     plot(x_range,                                 # x-dim
          y_range,                                   # y-dim
          type="n",                                  # empty plot
          xlab="log2 Fold Change",                   # x-axis title
          ylab="-log10 t-Test P-value",              # y-axis title
          main="Volcano Plot",                       # plot title
          )
     abline(h=-log10(0.05),col="green",lty="44")# horizontal line at P=0.05
     abline(v=c(-1,1),col="violet",lty="1343")  # vertical lines at 2-fold

     ## Define colors based on their values:
     color <- c()
     for (ii in 1:m)
         ## If it's below 0.05, we're not overly interested: purple.
         if (-log10(pvals[ii])>(-log10(0.05))) {
             ## Otherwise, more checks;
             ## if it's greater than 2-fold decrease: blue
             if (log2(folds[2,][ii])>(-1)) {
                 ## If it's significant but didn't change much: orange
                 if (log2(folds[2,][ii])<1) {
                     color[ii] <- "orange"
                     ## Otherwise, greater than 2-fold increase: red
                 } else {
                     color[ii] <- "red"
                 }
             } else {
                 color[ii] <- "blue"
             }
         } else {
             color[ii] <- "purple"
         }
     points(
            log2(folds[2,]),
            -log10(pvals),
            col=color,
            pch=20
            )

}

# example dummy data
folds <- rbind(0,exp(rnorm(500)))
pvals <- runif(500)

volcano_plot(folds, pvals)


If you vectorize the computation of the colors, it simplifies even more:

volcano_plot<-function(fold, pval) {
     x_range <- range(c(log2(folds[2,]), -1.5, 1.5))
     y_range <- range(c(-log10(pvals), 0, 2))

     plot(x_range,                                 # x-dim
          y_range,                                   # y-dim
          type="n",                                  # empty plot
          xlab="log2 Fold Change",                   # x-axis title
          ylab="-log10 t-Test P-value",              # y-axis title
          main="Volcano Plot",                       # plot title
          )
     abline(h=-log10(0.05),col="green",lty="44")# horizontal line at P=0.05
     abline(v=c(-1,1),col="violet",lty="1343")  # vertical lines at 2-fold

     ## Define colors based on their values:
     ## Not significant: purple
     ## Significant and smaller than half fold change: blue
     ## Significant and larger than two fold change: red
     ## Significant but between half and two fold change: orange
     color <- ifelse(-log10(pvals)>(-log10(0.05)),
                     ifelse(log2(folds[2,])>(-1),
                            ifelse(log2(folds[2,]<1),
                                   "orange",
                                   "red"),
                            "blue"),
                     "purple")
     points(
            log2(folds[2,]),
            -log10(pvals),
            col=color,
            pch=20
            )

}

volcano_plot(folds, pvals)

>      # Return table to analyse results
>
>      #
>      #    Generate figures as image files
>      #
>      #    (Uncomment blocks as necessary)
>
>      ##### jpeg #####
>      # pic_jpg<-function(filename, fold, pval)
>      #     {# Start jpeg device with basic settings
>      #     jpeg(filename,
>      #         quality=100,                           # image quality (percent)
>      #         bg="white",                            # background colour
>      #         res=300,                               # image resolution (dpi)
>      #         units="in", width=8.3, height=5.8)     # image dimensions (inches)
>      #     par(mgp=c(5,2,0),                          # axis margins
>      #                                                # (title, labels, line)
>      #         mar=c(6,6,4,2),                        # plot margins (b,l,t,r)
>      #         las=1                                  # horizontal labels
>      #         )
>      #     # Draw the plot
>      #     volcano_plot(folds, pvals)
>      #     dev.off()
>      #     }
>      # pic_jpg("volcano_plot.jpg")
>      ##### end jpeg #####
>
>
>      #### png #####
>      # pic_png<-function(filename, fold, pval)
>      #     {# Start png device with basic settings
>      #     png(filename,
>      #         bg="white",                            # background colour
>      #         res=300,                               # image resolution (dpi)
>      #         units="in", width=8.3, height=5.8)     # image dimensions (inches)
>      #     par(mgp=c(5,2,0),                          # axis margins
>      #                                                # (title, labels, line)
>      #         mar=c(6,6,4,2),                        # plot margins (b,l,t,r)
>      #         las=1                                  # horizontal labels
>      #         )
>      #     # Draw the plot
>      #     volcano_plot(folds, pvals)
>      #     dev.off()
>      #     }
>      # pic_png("volcano_plot.png")
>      #### end png #####
>
>
>      # #### tiff #####
>      # pic_tiff<-function(filename, fold, pval)
>      #     {# Start tiff device with basic settings
>      #     tiff(filename,
>      #         bg="white",                            # background colour
>      #         res=300,                               # image resolution (dpi)
>      #         units="in", width=8.3, height=5.8)     # image dimensions (inches)
>      #         compression="none"                     # image compression
>      #                                                #  (one of none, lzw, zip)
>      #     par(mgp=c(5,2,0),                          # axis margins
>      #                                                # (title, labels, line)
>      #         mar=c(6,6,4,2),                        # plot margins (b,l,t,r)
>      #         las=1                                  # horizontal labels
>      #         )
>      #     # Draw the plot
>      #     volcano_plot(folds, pvals)
>      #     dev.off()
>      #     }
>      # pic_tiff("volcano_plot.tif")
>      # #### end tiff #####
>
>
>
>
>      #
>      #    Legacy code which allows for blue/red to be independent of 0.05
>      #    (purple is limited to the middle lower region)
>      #
>      #####
>      #     for (ii in 1:m)
>      #         if (log2(folds[2,][ii])<1) {
>      #             if (log2(folds[2,][ii])>-1) {
>      #                 if (-log10(pvals[ii])<(-log10(0.05))) {
>      #                     points(
>      #                         log2(folds[2,][ii]),
>      #                         -log10(pvals[ii]),
>      #                         col="purple",
>      #                         pch=20
>      #                         )
>      #                     } else {
>      #                         points(
>      #                             log2(folds[2,][ii]),
>      #                             -log10(pvals[ii]),
>      #                             col="orange",
>      #                             pch=20
>      #                             )
>      #                     }
>      #                 } else {
>      #                     points(
>      #                         log2(folds[2,][ii]),
>      #                         -log10(pvals[ii]),
>      #                         col="blue",
>      #                         pch=20
>      #                         )
>      #                     }
>      #             } else {
>      #                 points(
>      #                     log2(folds[2,][ii]),
>      #                     -log10(pvals[ii]),
>      #                     col="red",
>      #                     pch=20
>      #                     )
>      #             }
>
> # If function from above needs to be closed
> }
> 	[[alternative HTML version deleted]]
>
>
>
>


-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list