[BioC] limma topTable question

Wed Apr 6 03:32:51 CEST 2005

Dear Gordon,

Thanks for your comments. I have attached some additional remarks 
below. Please note that these comments all fall in the minor nit 
category, my immediate problems have been solved, and I'm very pleased 
with limma. That being said, I have a few comments that I think speak 
to making limma more useable by novice users such as myself.

On Apr 5, 2005, at 4:37 PM, Gordon Smyth wrote:

> I am not convinced that this would be a useful option. Microarray 
> differential expression analyses are virtually always two-sided, for 
> good reason, because researchers need to know about genes moving 
> strongly down as well as up. Hence topTable() is designed to 
> facilitate a two-sided analysis. You are implying that you want to do 
> an anlaysis in which you don't want to even see the genes moving down. 
> Why do you think that this is a generally useful analysis? I haven't 
> seen any microarray problems which I would want to analyse that way.

I can certainly see the case for being interested in genes moving up 
and down. The reason I wanted the particular feature described was that 
I had hacked together a simple way of looking at the top and bottom N 
genes for a given expt. relative to the others in a set via either, 
basically, M or A. Limma, of course, offers a much better way of doing 
this and more with contrasts and fitting. However, I still wanted to 
see the top and bottom N for comparison with what I had done before. I 
agree that both up- and down-regulated genes are interesting, but by 
sorting by abs(M), we are limiting ourselves to the top N in abs(M). In 
this case I wanted top N/2 by M and bottom N/2 by M. I certainly don't 
claim to be an expert on any of this; I do claim to be "surprised" by 
the fact that sorting on M was sorting on abs(M). (Granted it was easy 
enough to figure out what was going on, but the principle of least 
surprise is always good.)

> If you have a need which is specific to your own situation, and not 
> likely to be of wide interest, then Sean and Kasper have explained how 
> you can make your own functions.

Works for me. I now have options 'U' and 'D' which do the obvious thing 
with M descending and ascending.

>> Also, the description of what is going on here in the help page is
>> rather cryptic. It is only in the discussion of resort.by that the abs
>> thing comes up. Furthermore, a better description of what "M" "A", 
>> "T",
>> "P" and "B" are would be helpful.
>
> You don't say, but I assume that you are refering to the list of 
> possible values for the 'sort.by' argument to 'topTable'. Confusion 
> about ranking on M vs abs(M) does not normally arise, because 
> microarray analysis are virtually always two-sided and topTable() 
> provides the behaviour which users generally expect.

Perhaps that should read "experienced users" as I expected something 
different, but clearly I'm new to all of this.

>>  They are each discussed in the
>> context of the Values (for M, t, p and b) and in the Arguments (for 
>> A),
>> but it would be nice if in the details section, this was 
>> recapitulated,
>> perhaps with a description of what an A value is,
>
> See the User's Guide section "Statistics for differential expression".

Fair enough, but a simple recapitulation of what these options mean in 
the topTable help page would help folks like me. As it is written, the 
options are enumerated for sort.by and resort.by and M, T, P, and B are 
described as columns of the return value, but it isn't explicitly 
stated that the input values to sort.by and resort.by are, basically, 
but not strictly, column names of the output. I suppose it's obvious, 
but a clearer description of the argument would help and would provide 
a place to inform the user that selecting sort.by="M" really means 
abs(M).

>>  and if the case
>> options were spelled out. It seems the only t and p are allowed to be
>> lowercase, but it's not clear why.
>> (Combining my question and my gripe, a sort by "m" that didn't do
>> abs(M) would seem useful to me, but perhaps I'm missing something.)
>
> R is a case-sensitive language, and it doesn't seem such a hardship to 
> use a capital "M" in sort.by="M", as per the documentation. The reason 
> that "M", "A" and "B" are treated strictly as upper-case is because 
> these symbols are always uppercase in the published microarray 
> literature.

It's no hardship, I'm just arguing for completeness of the docs. If you 
accept 'p' and 't' as synonyms, for 'P' and 'T', say so.

Thanks for your efforts in creating limma! It's a rather handy tool and 
I look forward to using it more.

Sincerely,

Cyrus