[BioC] DESeq - plotPCA
Zaki Fadlullah [guest]
guest at bioconductor.org
Wed Mar 6 08:43:27 CET 2013
Hi mailing list,
I have a question regarding the plotPCA function in DESeq.
Looking into the plotPCA code I realised that the PCA function takes into account the 500 genes (ntop = 500 ,500 is just for an example, as this number can be adjusted). Am I correct in understanding that this 500 genes are the most variable genes??
plotPCA = function(x, intgroup, ntop=500)
rv = rowVars(exprs(x))
select = order(rv, decreasing=TRUE)[seq_len(ntop)]
pca = prcomp(t(exprs(x)[select,]))
fac = factor(apply(pData(vsdFull)[, intgroup], 1, paste, collapse=" : "))
colours = brewer.pal(nlevels(fac), "Paired")
pcafig = xyplot(PC2 ~ PC1, groups=fac, data=as.data.frame(pca$x), pch=16, cex=2,
aspect = "iso", col=colours,
main = draw.key(key = list(
rect = list(col = colours),
text = list(levels(fac)),
rep = FALSE)))
Specifically what is actually meant by most variable genes?? and why would one use variable genes it in PCA plot??
Would a conclusion be is - If the 500 most variable gene cluster together (as seen from PCA plot [figure 17] in the DESeq vignttes), it means our expression data is good?? ... because even the most variable genes do group together??
More generally (not DESeq specific)...If the purpose of doing a PCA is to get a general overview on the data. Would it be best to do a PCA on all of the genes rather than a subset (say 500)?
Appreciate any insight into this matter as I am new in R and RNA-seq
-- output of sessionInfo():
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor