[BioC] limma topTable question

Gordon Smyth smyth at wehi.edu.au
Thu Apr 7 01:57:13 CEST 2005


At 11:32 AM 6/04/2005, Cyrus Harmon wrote:
>Dear Gordon,
>
>Thanks for your comments. I have attached some additional remarks below. 
>Please note that these comments all fall in the minor nit category, my 
>immediate problems have been solved, and I'm very pleased with limma. That 
>being said, I have a few comments that I think speak to making limma more 
>useable by novice users such as myself.
>
>On Apr 5, 2005, at 4:37 PM, Gordon Smyth wrote:
>
>>I am not convinced that this would be a useful option. Microarray 
>>differential expression analyses are virtually always two-sided, for good 
>>reason, because researchers need to know about genes moving strongly down 
>>as well as up. Hence topTable() is designed to facilitate a two-sided 
>>analysis. You are implying that you want to do an anlaysis in which you 
>>don't want to even see the genes moving down. Why do you think that this 
>>is a generally useful analysis? I haven't seen any microarray problems 
>>which I would want to analyse that way.
>
>I can certainly see the case for being interested in genes moving up and 
>down. The reason I wanted the particular feature described was that I had 
>hacked together a simple way of looking at the top and bottom N genes for 
>a given expt. relative to the others in a set via either, basically, M or 
>A. Limma, of course, offers a much better way of doing this and more with 
>contrasts and fitting. However, I still wanted to see the top and bottom N 
>for comparison with what I had done before. I agree that both up- and 
>down-regulated genes are interesting, but by sorting by abs(M), we are 
>limiting ourselves to the top N in abs(M). In this case I wanted top N/2 
>by M and bottom N/2 by M. I certainly don't claim to be an expert on any 
>of this; I do claim to be "surprised" by the fact that sorting on M was 
>sorting on abs(M). (Granted it was easy enough to figure out what was 
>going on, but the principle of least surprise is always good.)

I tend to aim the limma package at wet-lab biological collaborators, and 
tend to put into the package only things that I feel confident about 
recommending to them. (This isn't true of everything on bioconductor -- 
much of bioconductor is aimed more at computational biologists or software 
developers.) The trouble with looking for a fixed number of genes up and a 
fixed number down is that the cutoff threshold, in terms of evidence for 
differential expression, will be different for the two lists. This is why I 
am reluctant to recommend it to biological collaborators and to include it 
as an option in topTable().

This isn't to say that getting N/2 genes up and down isn't useful for some 
purposes, e.g., comparisons of analysis methodologies. It is just that this 
sort of work (meta analysis) tends to be done by experienced programmers, 
probably including yourself, rather than by wet-lab biologists, and these 
people tend to want modular tools rather than ready-made solutions.

What I would like to add is a facility in limma to get a topTable like 
summary for any specified subset of genes rather than by ranking. This 
might be added to the topTable() function, but there are other ways as 
well. One can already select a subset of an MArrayLM object. E.g., to 
reduce to only the top 100 genes in terms of fold change up for a linear 
model with two coefs:

o <- order(fit$coef[,2], decreasing=TRUE)
fit.top100 <- fit[o[1:100],]

although this facility still has some bugs in the current limma version. 
You can also select only the coefficient of interest:

fit.top100 <- fit.top100[,2]

In the next limma version you will be able to coerce a fitted model object 
to a data.frame, e.g.,

tab <- as.data.frame(fit.top100)

which gives you something very much like the topTable() summary.

which gives you something very
>>If you have a need which is specific to your own situation, and not 
>>likely to be of wide interest, then Sean and Kasper have explained how 
>>you can make your own functions.
>
>Works for me. I now have options 'U' and 'D' which do the obvious thing 
>with M descending and ascending.
>
>>>Also, the description of what is going on here in the help page is
>>>rather cryptic. It is only in the discussion of resort.by that the abs
>>>thing comes up. Furthermore, a better description of what "M" "A", "T",
>>>"P" and "B" are would be helpful.
>>
>>You don't say, but I assume that you are refering to the list of possible 
>>values for the 'sort.by' argument to 'topTable'. Confusion about ranking 
>>on M vs abs(M) does not normally arise, because microarray analysis are 
>>virtually always two-sided and topTable() provides the behaviour which 
>>users generally expect.
>
>Perhaps that should read "experienced users" as I expected something 
>different, but clearly I'm new to all of this.

I am guessing that you come from a programming background, and programmers 
would naturally expect the behaviour to be procedural. My experience with 
non-mathematical biologists is that they expect the behaviour to be 
two-sided. Anyway, I will make it more explicit in the documentation.

>>>  They are each discussed in the
>>>context of the Values (for M, t, p and b) and in the Arguments (for A),
>>>but it would be nice if in the details section, this was recapitulated,
>>>perhaps with a description of what an A value is,
>>
>>See the User's Guide section "Statistics for differential expression".
>
>Fair enough, but a simple recapitulation of what these options mean in the 
>topTable help page would help folks like me. As it is written, the options 
>are enumerated for sort.by and resort.by and M, T, P, and B are described 
>as columns of the return value, but it isn't explicitly stated that the 
>input values to sort.by and resort.by are, basically, but not strictly, 
>column names of the output. I suppose it's obvious, but a clearer 
>description of the argument would help and would provide a place to inform 
>the user that selecting sort.by="M" really means abs(M).
>
>>>  and if the case
>>>options were spelled out. It seems the only t and p are allowed to be
>>>lowercase, but it's not clear why.
>>>(Combining my question and my gripe, a sort by "m" that didn't do
>>>abs(M) would seem useful to me, but perhaps I'm missing something.)
>>
>>R is a case-sensitive language, and it doesn't seem such a hardship to 
>>use a capital "M" in sort.by="M", as per the documentation. The reason 
>>that "M", "A" and "B" are treated strictly as upper-case is because these 
>>symbols are always uppercase in the published microarray literature.
>
>It's no hardship, I'm just arguing for completeness of the docs. If you 
>accept 'p' and 't' as synonyms, for 'P' and 'T', say so.

I will make it more explicit.

Gordon

>Thanks for your efforts in creating limma! It's a rather handy tool and I 
>look forward to using it more.
>
>Sincerely,
>
>Cyrus
>



More information about the Bioconductor mailing list