[Bioc-devel] Changes in AnnotationDbi

Thu Jun 4 20:25:05 CEST 2015

On Thu, Jun 4, 2015 at 1:50 PM, James W. MacDonald <jmacdon at uw.edu> wrote:

> In the last release, the warning message from select() telling people that
> their results include one-to-many mappings was removed. While some may find
> this warning annoying, I think silently returning something unexpected to
> our users is dangerous.
>

I would agree.  I have no problem with the warning (maybe a message would
be better).

>
> In other words, for me it is a common practice to do something like this:
>
> fit <- lmFit(eset, design)
> fit2 <- eBayes(fit)
> gns <- select(<chippackage>, featureNames(eset), c("ENTREZID","SYMBOL"))
> gns <- gns[!duplicated(gns[,1]),]
> fit2$genes <- gns
>
> I add in the step where dups are removed because I already know they are
> there. But a naive user might instead do
>
> fit2$genes <- select(<chippackage>, featureNames(eset),
> c("ENTREZID","SYMBOL"))
>
> Which will work just fine, but then all the annotation (except for the
> first few lines) will now be completely incorrect, and there wasn't a
> warning to let the end user know that they may have made a mistake.
>
> lmFit() will parse the featureData slot of an ExpressionSet and use those
> data for annotation, so that gives some hypothetical protections, for those
> who first put their annotation data into their ExpressionSet. However,
> ?eSet says:
>
>  ‘featureData’: Contains variables describing features (i.e., rows
>           in ‘assayData’) unique to this experiment. Use the
>           ‘annotation’ slot to efficiently reference feature data
>           common to the annotation package used in the experiment.
>           Class: ‘AnnotatedDataFrame-class’
>
> Which to me indicates that the featureData slot isn't really intended to
> contain annotation data, but instead some unique information that pertains
> to a given experiment. But maybe I misunderstand.
>

I think we did not want to bloat the expression data with annotation if all
could be resolved via the annotation tag.  I'd speculate that one of the
reasons limma plays nicely with the featureData is to cover the two-color
case custom array case, where no annotation package will resolve the
identifiers.

Now if eBayes returned an S4 class instance ....

>
> Is the featureData slot actually intended for annotation data? If not, what
> is the intended pipeline for annotating data in an ExpressionSet? Am I
> alone in being concerned about this?
>
> Best,
>
> Jim
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]