[BioC] SmearPlot edgeR - adding gene names to genes of interest

Christine Gläßer c.glaesser at zmbh.uni-heidelberg.de
Mon Dec 16 13:53:40 CET 2013


Am 16.12.2013 11:26, schrieb Aliaksei Holik:
> Hi Christine,
>
> There's probably a more elegant way, but that's how I do it.
>
> You should be able to use R's 'text' function by taking logCPM values 
> of the genes you're interested in as 'x' argument, logFC values as 'y' 
> argument, and the gene names or ID's as 'labels' argument.
>
> Say top.x is your resulting "TopTags" object and it contains dataframe 
> called 'table', which contains among others the following columns:
> Symbols - logFC - logCPM
>
> You should be able to subset it to the genes you're interested in by:
>
> ids <- c("...", "...")
> gene.labels <- x$table[x$table$Symbols %in% ids,]
>
> Where 'ids' is a vector of gene symbols for genes of interest, but you 
> can use Entrez Gene IDs or any other thing you can think of, for 
> instance all genes with logFC > 1 etc.
>
> After you've plotted your smear plot from DGELRT or DGEExact object 
> you can mark the genes of interest with the following:
>
> text(x=gene.labels$logCPM,
>      y=gene.labels$logFC,
>      labels=gene.labels$Symbols, cex=0.7, pos=1)
>
> Note that this will not work if you're plotting the SmearPlot from 
> DGEList as your TopTags object will contain squeezed logCPM values 
> different from those in DGEList.
>
> Hope it helps,
>
> Aliaksei.
>
> On 16/12/13 7:07 PM, Christine Gläßer wrote:
>> Dear all,
>>
>> I have used edgeR for analyzing differential expression. I also created
>> a smear plot visualizing DGE results (code see below). I'd now like to
>> highlight some genes of interest (they are significantly diff.
>> expressed) in this plot by labeling them; do you know if there's a way
>> to do so?
>>
>> Kind regards, and many thanks in advance,
>>
>> Christine Gläßer
>>
>>
>> ###### Code ########
>>
>> counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t",
>> row.names = 1, as.is = TRUE))
>> cpms<-cpm(counts)
>> keep <- rowSums(cpms>1)>=4  ### at least 4 replicates
>> counts <- counts[keep,]
>>
>> group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3))
>>
>> d <- DGEList(counts=counts, group=group)
>> d <- calcNormFactors(d)
>>
>> design <- model.matrix(~ 0+group)
>> colnames(design) <- cols
>>
>> contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2,
>> levels=design)
>>
>> y <- estimateGLMCommonDisp(d,design)
>> y <- estimateGLMTrendedDisp(y,design)
>> y <- estimateGLMTagwiseDisp(y,design)
>> fit <- glmFit(y,design)
>>
>> for (i in colnames(contrasts))
>> {
>>    print (i)
>>
>>    outfilename <- paste(paste(outdir, i, sep="/"))
>>    outfilename <- paste(paste(outfilename, ".txt", sep=""))
>>    outfilesmearplot <- paste(paste(outdir, i, sep="/"))
>>    outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf",
>> sep=""))
>>
>>    lrt <- glmLRT(fit, contrast=contrasts[,i])
>>    tt <- topTags(lrt, n=nrow(d), adjust.method="BH")
>>    rn <- rownames(tt$table)
>>    deg <- rn[tt$table$FDR < 0.05]
>>
>>    pdf(outfilesmearplot)
>>   ###### SMEAR PLOT STARTS HERE #######
>>    plotSmear(d, de.tags=deg)
>>    abline(h = c(-1, 1), col = "dodgerblue")
>>    dev.off()
>>    write.table(tt$table, file=outfilename, sep="\t")
>> }
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
Dear Aliaksei,

thank you very much for your help, it worked out. As an example, I have 
adapted the example given in the documentation in plotSmear:

##################

y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y))
rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".")
d <- estimateCommonDisp(d)

      # find differential expression
de <- exactTest(d)

      # highlighting the top 500 most DE tags
de.tags <- rownames(topTags(de, n=500)$table)

plotSmear(d, de.tags=de.tags)

de$genes <- rownames(d$counts)

ids <- c("tag.2323", "tag.9")
gene.labels <- de$table[de$genes %in% ids,]
text(x=gene.labels$logCPM, y=gene.labels$logFC, 
labels=rownames(gene.labels), cex=0.7, pos=1)

###################

I'd also like to add a line pointing at these specific genes (or 
highlight them a bit better, e.g. with a specific color) - otherwise the 
plot is not very informative, since there are many dots in close 
neighborhood to my genes of interest. Do you also know how that could be 
done?

Thank you very much for your help, I greatly appreciate it!

Kind regards,

Christine Gläßer



More information about the Bioconductor mailing list