[Bioc-devel] Changes in AnnotationDbi
Marc Carlson
mcarlson at fredhutch.org
Mon Jun 8 21:12:42 CEST 2015
OK Jim,
I will put very simple messages in (one liners) that will simply state
whether the relationship between keys and the requested columns was 1:1,
1:many, many:1, or many:many. Hopefully this will represent an
acceptable compromise.
Marc
On 06/05/2015 08:37 AM, James W. MacDonald wrote:
> I agree that a warning is probably not the way to go, as it does imply
> that there might have been something wrong with either the input or
> output. Plus, not everybody understands the distinction between error
> and warning.
>
> And having additional documentation can't possibly hurt. But that
> assumes that most/some/all of the end users both peruse and understand
> the documentation, which we all know is not the case. The main issue,
> for me at least, is that a significant proportion of people seem to
> think there is some sort of uniqueness imposed on things like Entrez
> Gene IDs and Hugo symbols, etc. While that is the ultimate goal, we do
> not have and maybe never will achieve unique IDs for each annotatable
> object.
>
> I used to work for a PI who was a very smart, well informed
> statistical geneticist who was absolutely shocked when I informed her
> that a) there are SNPs in dbSNP that have more than one RS ID, and
> that b.) there are RS IDs in dbSNP that have been assigned to multiple
> SNPs. She just assumed that there was a one-to-one RS ID -> SNP mapping.
>
> So this is to me the crux of the problem. It is perfectly valid to
> return one-to-many mappings, and that is what should be expected /by
> those of us who already understand such things. /But for those of us
> who are ignorant of the details, and those who assume uniqueness of
> IDs, it would be really nice if they got a message telling them
> something like
>
> /Please note that there are one-to-many mappings between the input and
> output IDs, so the output is longer than your input vector. Please see
> ?select for more detail./
> /
> /
> And if the message is objectionable to some, you could give the option
> for people to set a global flag to shut it off. Something like
>
> if(!pleaseMakeItStop)
> message(<message goes here>)
>
> and they could set
>
> pleaseMakeItStop = TRUE in their .Rprofile
>
> Is that a reasonable compromise?
>
> Jim
>
>
>
> On Thu, Jun 4, 2015 at 6:06 PM, Marc Carlson <mcarlson at fredhutch.org
> <mailto:mcarlson at fredhutch.org>> wrote:
>
> Hi Jim,
>
> I do agree that the warning was protective for that (this is why I
> put it there).
>
> But it was also annoying for many and a source of some confusion
> because when people see a warning() they think that something has
> gone wrong with the code that was just run. And in this case the
> select method was actually doing exactly what it was supposed to
> be doing. What it was actually warning you about was what you did
> separately in that assignment to fit2... Which is the step right
> after the select method already did it's work. And I can
> understand why that seems a little bit confusing since you are
> basically telling someone to be careful with the data you just
> gave them.
>
> Now I could replace it with a message() I guess, but in cases like
> this where the warning is about something that happens outside of
> the function you are calling, shouldn't that probably be handled
> by documentation? Or at least, that is the argument that finally
> persuaded me to remove it. That and that fact that almost every
> call to select() ended up accompanied by the warning you
> mentioned, because it turns out that perfect 1:1 relationships are
> pretty rare for annotation data. Very often, you are going to get
> back multiple results.
>
> But I didn't just remove the warning, I also supplied an
> alternative for people who have a real need for consistent 1:1
> mapping.
>
> The mapIds() method takes most of the same arguments as select,
> except that unlike select(), it only looks up one column and it
> always returns a vector that is the same size as the vector that
> came in.
>
> So for your example, you could do something like this psuedocode here:
>
> mapIds(<chippackage>, featureNames(eset), column="ENTREZID",
> keytype="PROBEID")
>
> And mapIds will follow a rule specified by the default value for
> the multiVals argument so that you can get back your results in a
> 1:1 manner. And if you don't like any of the options available
> for the multiVals argument, you can make your own function and
> pass it in.
>
>
> Anyhow please continue to let us know what you think?
>
>
> Marc
>
>
>
>
>
>
>
> On 06/04/2015 10:50 AM, James W. MacDonald wrote:
>
> In the last release, the warning message from select() telling
> people that
> their results include one-to-many mappings was removed. While
> some may find
> this warning annoying, I think silently returning something
> unexpected to
> our users is dangerous.
>
> In other words, for me it is a common practice to do something
> like this:
>
> fit <- lmFit(eset, design)
> fit2 <- eBayes(fit)
> gns <- select(<chippackage>, featureNames(eset),
> c("ENTREZID","SYMBOL"))
> gns <- gns[!duplicated(gns[,1]),]
> fit2$genes <- gns
>
> I add in the step where dups are removed because I already
> know they are
> there. But a naive user might instead do
>
> fit2$genes <- select(<chippackage>, featureNames(eset),
> c("ENTREZID","SYMBOL"))
>
> Which will work just fine, but then all the annotation (except
> for the
> first few lines) will now be completely incorrect, and there
> wasn't a
> warning to let the end user know that they may have made a
> mistake.
>
> lmFit() will parse the featureData slot of an ExpressionSet
> and use those
> data for annotation, so that gives some hypothetical
> protections, for those
> who first put their annotation data into their ExpressionSet.
> However,
> ?eSet says:
>
> ‘featureData’: Contains variables describing features (i.e.,
> rows
> in ‘assayData’) unique to this experiment. Use the
> ‘annotation’ slot to efficiently reference feature data
> common to the annotation package used in the
> experiment.
> Class: ‘AnnotatedDataFrame-class’
>
> Which to me indicates that the featureData slot isn't really
> intended to
> contain annotation data, but instead some unique information
> that pertains
> to a given experiment. But maybe I misunderstand.
>
> Is the featureData slot actually intended for annotation data?
> If not, what
> is the intended pipeline for annotating data in an
> ExpressionSet? Am I
> alone in being concerned about this?
>
> Best,
>
> Jim
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
> list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list