[BioC] Bioconductor documentation
Gordon Smyth
smyth at wehi.edu.au
Tue Aug 31 06:32:31 CEST 2004
At 11:58 PM 29/08/2004, Naomi Altman wrote:
>As always, I am grateful to the developers for donating their wonderful
>software. However, the issue of why the documentation is hard to use
>keeps rearing its head, so ...
I'm not sure what you mean by "the issue of why", apart from the obvious
fact that the software is produced by very busy people as a side product of
their research lab activities. We can't work full-time on the packages and
they are never likely to be a fully-featured or as fully-documented as you
would like. In the case of limma, my aim is for the code and the
documentation to be of a comparable standard to that of the packages in the
standard distribution of R (base, stats, graphics, utils, methods).
Specific comments and suggestions re where that fails to be the case are
welcome.
>One of the problems I am finding with the Bioconductor documentation is
>that it is not sufficiently explicit, so I often need to go into the code
>to determine what the routine is doing. As 2 examples,
>
>lmFit (limma) can take as input an marrayNorm object and by default
>extracts "maM". But if you type ?lmFit, this is not given in the
>documentation. I have not looked at the Vignette to see if it is listed
>there. However, I see the vignettes as tutorials - I should be able to
>find out what a routine does from its internal documentation. The
>documentation should be explicit about what is extracted from each type of
>input object
Thanks for this feedback. It is true that the documentation doesn't say
explictiy which slot or component is extracted from each type of object.
This is partly because it seemed almost self-explanatory. The function
lmFit() simply extracts the expression data from the appropriate slot or
component of the input data object. It doesn't do any unexpected processing
or computation which would require special documentation, rather the value
of the appropriate slot is taken as is. Each class of object has only one
slot or component which could be sensibly extracted in this way.
Anyway, I have written an extra two paragraphs of explanation in the
Details section of the lmFit() help to make explicit what is extracted from
each object. This will be in limma 1.7.5 when that is released.
>what is output (if this differs by input object). I might note that this
>is particularly cogent for limma, since limma works directly with
>contrasts for 2-color arrays, but requires an extra contrast step for
>1-channel arrays.
I don't think that this criticism is fair. The output from lmFit() does not
vary depending on the input object. It is central to the philosophy of
limma that all the models fitted produce an object of the same MArrayLM
class, with output components that have the same meaning. It is true that
one will want to fit different models depending on the meaning of the input
data, but it is the user's responsibility to choose a sensible model and to
interpret the output appropriately. The situation is very closely analogous
to that of lm() in the stats package.
It is not true that the fitted model requires an extra contrasts step for
1-channel arrays, rather one may use lmFit() with or without
contrasts.fit() for both 2-color or 1-color arrays. See for example Section
8.3 of the User's Guide which analyses an affy data set without using
contrasts.fit(). For another analysis described in Section 8.4,
contrasts.fit() is used only to obtain F-statistics for a pair of
coefficients of interest. Otherwise the analysis would stand without the
use of contrasts.fit().
It is actually impossible for lmFit() to determine whether the expression
values being input are log-ratios or log-expression values when the input
is a matrix or an exprSet. The affy package for example outputs exprSet
objects which contain log-ratios while coercion to exprSet from an
marrayNorm object produces an exprSet object which contains log-ratios. For
this reason it would be impossible for lmFit() to output a different class
of object depending on the type of input data.
Gordon
>Similarly, I cannot tell from the documentation for maNorm or maNormMain
>whether the background values are used in the normalization. I.e. the
>documentation should state which component of the input object will be
>used and how.
>
>Thanks.
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348 (Statistics)
>University Park, PA 16802-2111
More information about the Bioconductor
mailing list