[BioC] Bioconductor documentation

Tue Aug 31 06:32:31 CEST 2004

At 11:58 PM 29/08/2004, Naomi Altman wrote:
>As always, I am grateful to the developers for donating their wonderful 
>software.  However, the issue of why the documentation is hard to use 
>keeps rearing its head, so ...

I'm not sure what you mean by "the issue of why", apart from the obvious 
fact that the software is produced by very busy people as a side product of 
their research lab activities. We can't work full-time on the packages and 
they are never likely to be a fully-featured or as fully-documented as you 
would like. In the case of limma, my aim is for the code and the 
documentation to be of a comparable standard to that of the packages in the 
standard distribution of R (base, stats, graphics, utils, methods). 
Specific comments and suggestions re where that fails to be the case are 
welcome.

>One of the problems I am finding with the Bioconductor documentation is 
>that it is not sufficiently explicit, so I often need to go into the code 
>to determine what the routine is doing.  As 2 examples,
>
>lmFit (limma) can take as input an marrayNorm object and by default 
>extracts "maM".  But if you type ?lmFit, this is not given in the 
>documentation.  I have not looked at the Vignette to see if it is listed 
>there.  However, I see the vignettes as tutorials - I should be able to 
>find out what a routine does from its internal documentation.  The 
>documentation should be explicit about what is extracted from each type of 
>input object

Thanks for this feedback. It is true that the documentation doesn't say 
explictiy which slot or component is extracted from each type of object. 
This is partly because it seemed almost self-explanatory. The function 
lmFit() simply extracts the expression data from the appropriate slot or 
component of the input data object. It doesn't do any unexpected processing 
or computation which would require special documentation, rather the value 
of the appropriate slot is taken as is. Each class of object has only one 
slot or component which could be sensibly extracted in this way.

Anyway, I have written an extra two paragraphs of explanation in the 
Details section of the lmFit() help to make explicit what is extracted from 
each object. This will be in limma 1.7.5 when that is released.

>what is output (if this differs by input object).  I might note that this 
>is particularly cogent for limma, since limma works directly with 
>contrasts for 2-color arrays, but requires an extra contrast step for 
>1-channel arrays.

I don't think that this criticism is fair. The output from lmFit() does not 
vary depending on the input object. It is central to the philosophy of 
limma that all the models fitted produce an object of the same MArrayLM 
class, with output components that have the same meaning. It is true that 
one will want to fit different models depending on the meaning of the input 
data, but it is the user's responsibility to choose a sensible model and to 
interpret the output appropriately. The situation is very closely analogous 
to that of lm() in the stats package.

It is not true that the fitted model requires an extra contrasts step for 
1-channel arrays, rather one may use lmFit() with or without 
contrasts.fit() for both 2-color or 1-color arrays. See for example Section 
8.3 of the User's Guide which analyses an affy data set without using 
contrasts.fit(). For another analysis described in Section 8.4, 
contrasts.fit() is used only to obtain F-statistics for a pair of 
coefficients of interest. Otherwise the analysis would stand without the 
use of contrasts.fit().

It is actually impossible for lmFit() to determine whether the expression 
values being input are log-ratios or log-expression values when the input 
is a matrix or an exprSet. The affy package for example outputs exprSet 
objects which contain log-ratios while coercion to exprSet from an 
marrayNorm object produces an exprSet object which contains log-ratios. For 
this reason it would be impossible for lmFit() to output a different class 
of object depending on the type of input data.

Gordon

>Similarly, I cannot tell from the documentation for maNorm or maNormMain 
>whether the background values are used in the normalization.  I.e. the 
>documentation should state which component of the input object will be 
>used and how.
>
>Thanks.
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348 (Statistics)
>University Park, PA 16802-2111