[BioC] Quickest way to convert IDs in a data frame?

Fri Jul 26 01:25:27 CEST 2013

A very generic and efficient solution to accomplish this in R is usually
to make use of a named vector. Here is an example:

## Sample data frame
df <- data.frame(ID=paste("g", 1:10, sep=""), t1=rnorm(10), t2=rnorm(10))
df
    ID          t1          t2
    1   g1  0.84906257 -1.10046605
    2   g2 -1.29354187 -0.05610518
    3   g3  1.00362290 -0.82640813
    4   g4  1.61035832 -1.04016446
    5   g5  0.23232417 -0.11921920
    6   g6 -1.89920999 -1.38235047
    7   g7 -0.34786030 -0.16438477
    8   g8 -1.28758867 -1.06968997
    9   g9 -0.71510804 -3.42711282
    10 g10 -0.02800613  0.01825634

## Sample lookup vector for whatever IDs 
lookup <- paste("g", sample(21:30), sep="")
names(lookup) <- paste("g", sample(1:10), sep="")
lookup
 g5    g4   g10    g6    g9    g1    g3    g7    g8    g2
"g23" "g30" "g25" "g27" "g29" "g21" "g22" "g24" "g26" "g28"

## Replace column with new IDs in proper order
df[,"ID"] <- lookup[as.character(df$ID)]

    ID          t1          t2
    1  g21  0.84906257 -1.10046605
    2  g28 -1.29354187 -0.05610518
    3  g22  1.00362290 -0.82640813
    4  g30  1.61035832 -1.04016446
    5  g23  0.23232417 -0.11921920
    6  g27 -1.89920999 -1.38235047
    7  g24 -0.34786030 -0.16438477
    8  g26 -1.28758867 -1.06968997
    9  g29 -0.71510804 -3.42711282
    10 g25 -0.02800613  0.01825634

Thomas

On Thu, Jul 25, 2013 at 10:54:25PM +0000, Enrico Ferrero wrote:
> Hi both,
> 
> Thanks for your insights, this is extremely interesting!
> 
> While I (kind of) understand why NAs get removed, deliberately
> truncating the output that way is probably not what most people
> expect. It may be worth considering filing a bug report for this?
> 
> This also brings me back to my original question: what's the simplest
> and most effienct way to create an exact copy of a column containing
> converted IDs in a data.frame?
> 
> I'm surprised there doesn't seem to be an easy ready-to-go solution,
> as I would imagine it is a rather common task to perform. As I
> mentioned in my first post, the for loop function works, but it's
> highly inefficient.
> 
> Any help is greatly appreciated, thank you.
> 
> Best,
> 
> 
> 
> On 25 July 2013 23:18, Hervé Pagès <hpages at fhcrc.org> wrote:
> > Hi James,
> >
> > You're right.
> >
> > It's actually both: NAs *and* duplicated keys that are mapped to
> > more than 1 row are removed from the input. I don't think this
> > is documented.
> >
> > I wonder if select() behavior couldn't be a little bit simpler by
> > either preserving or removing all duplicated keys, and not just some
> > of them (on a somewhat arbitrary criteria).
> >
> > Thanks,
> > H.
> >
> >
> >
> > On 07/25/2013 02:57 PM, James W. MacDonald wrote:
> >>
> >> Hi Enrico and Herve,
> >>
> >> This has to do with duplicate entries, but only when the duplicate entry
> >> maps to many ENTREZID:
> >>
> >>  > select(org.Hs.eg.db, rep("ADORA2A", 4), "ENTREZID", "ALIAS")
> >>      ALIAS ENTREZID
> >> 1 ADORA2A      135
> >> 2 ADORA2A      135
> >> 3 ADORA2A      135
> >> 4 ADORA2A      135
> >>
> >>  > select(org.Hs.eg.db, rep("AGT", 4), "ENTREZID", "ALIAS")
> >>    ALIAS ENTREZID
> >> 1   AGT      183
> >> 2   AGT      189
> >> Warning message:
> >> In .generateExtraRows(tab, keys, jointype) :
> >>    'select' and duplicate query keys resulted in 1:many mapping between
> >> keys and return rows
> >>
> >>  > select(org.Hs.eg.db, "AGT", "ENTREZID", "ALIAS")
> >>    ALIAS ENTREZID
> >> 1   AGT      183
> >> 2   AGT      189
> >> Warning message:
> >> In .generateExtraRows(tab, keys, jointype) :
> >>    'select' resulted in 1:many mapping between keys and return rows
> >>
> >>
> >> So in the instances where a gene symbol maps to more than one ENTREZID,
> >> the output gets truncated, whereas if it is a one-to-one mapping, it
> >> does not.
> >>
> >> Best,
> >>
> >> Jim
> >>
> >>
> >>
> >>
> >> On 7/25/2013 5:06 PM, Enrico Ferrero wrote:
> >>>
> >>> Hi,
> >>>
> >>> Hervé, that's exactly what I'm trying to say.
> >>>
> >>> Attached to this email is a tab delimited file with two columns of
> >>> GeneSymbols (or Aliases), and here is some simple code to reproduce
> >>> the unexpected behaviour:
> >>>
> >>> library(org.Hs.eg.db)
> >>> mydf<- read.table("testdata.txt", sep="\t", header=TRUE, as.is=TRUE)
> >>> mytest<- select(org.Hs.eg.db, key=mydf$GeneSymbol1, keytype="ALIAS",
> >>> cols=c("SYMBOL","ENTREZID","ENSEMBL"))
> >>> # check that mytest has less rows than mydf
> >>> nrow(mydf)
> >>> nrow(mytest)
> >>> # pick a random row: they don't match
> >>> mydf[250,]
> >>> mytest[250,]
> >>>
> >>> Ideally, mytest should have the same number and position of rows of
> >>> mydf so that I can then cbind them.
> >>> If mytest has more rows because of multiple mappings that's also fine:
> >>> I can always use merge(mydf, mytest), right?
> >>>
> >>> Thanks a lot to both for your help, it's very appreciated.
> >>> Best,
> >>>
> >>>
> >>> On 25 July 2013 21:32, Hervé Pagès<hpages at fhcrc.org>  wrote:
> >>>>
> >>>> Hi Enrico,
> >>>>
> >>>>
> >>>> On 07/25/2013 01:20 PM, James W. MacDonald wrote:
> >>>>>
> >>>>> Hi Enrico,
> >>>>>
> >>>>> Please don't take things off-list (e.g., use reply-all).
> >>>>>
> >>>>>
> >>>>> On 7/25/2013 2:17 PM, Enrico Ferrero wrote:
> >>>>>>
> >>>>>> Hi James,
> >>>>>>
> >>>>>> Thanks very much for your help.
> >>>>>> There is an issue that needs to be solved before thinking about what's
> >>>>>> the best approach in my opinion.
> >>>>>>
> >>>>>> I don't understand why, but the object created with the call to select
> >>>>>> (test in my example, first.two in yours) has a different number of
> >>>>>> rows from the original object (df in my example). Specifically it has
> >>>>>> *less* rows.
> >>>>
> >>>>
> >>>> I'm surprised it has less rows. It can definitely have more, when some
> >>>> of the keys passed to select() are mapped to more than 1 row, but my
> >>>> understanding was that select() would propagate unmapped keys to the
> >>>> output by placing them in rows stuffed with NAs. So maybe I
> >>>> misunderstood how select() works, or its behavior was changed, or
> >>>> there is a bug somewhere. Could you please send the code that allows
> >>>> us to reproduce this? Thanks.
> >>>>
> >>>> H.
> >>>>
> >>>>
> >>>>> If all symbols were converted to all possible Entrez IDs,
> >>>>>>
> >>>>>> I would expect it to have more rows, not less. To me, it looks like
> >>>>>> not all rows are looked up and returned.
> >>>>>>
> >>>>>> Do you see what I mean?
> >>>>>
> >>>>>
> >>>>> Sure. You could be using outdated gene symbols. Or perhaps you are
> >>>>> using
> >>>>> a mixture of symbols and aliases. Which is even cooler than just all
> >>>>> symbols:
> >>>>>
> >>>>>   >  symb<- c(Rkeys(org.Hs.egSYMBOL)[1:10],
> >>>>> Rkeys(org.Hs.egALIAS2EG)[31:45])
> >>>>>   >  symb
> >>>>>    [1] "A1BG"     "A2M"      "A2MP1"    "NAT1"     "NAT2"     "AACP"
> >>>>>    [7] "SERPINA3" "AADAC"    "AAMP"     "AANAT"    "AAMP"     "AANAT"
> >>>>> [13] "DSPS"     "SNAT"     "AARS"     "CMT2N"    "AAV"      "AAVS1"
> >>>>> [19] "ABAT"     "GABA-AT"  "GABAT"    "NPD009"   "ABC-1"    "ABC1"
> >>>>> [25] "ABCA1"
> >>>>>   >  select(org.Hs.eg.db, symb, "ENTREZID","SYMBOL")
> >>>>>        SYMBOL ENTREZID
> >>>>> 1      A1BG        1
> >>>>> 2       A2M        2
> >>>>> 3     A2MP1        3
> >>>>> 4      NAT1        9
> >>>>> 5      NAT2       10
> >>>>> 6      AACP       11
> >>>>> 7  SERPINA3       12
> >>>>> 8     AADAC       13
> >>>>> 9      AAMP       14
> >>>>> 10    AANAT       15
> >>>>> 11     AAMP       14
> >>>>> 12    AANAT       15
> >>>>> 13     DSPS<NA>
> >>>>> 14     SNAT<NA>
> >>>>> 15     AARS       16
> >>>>> 16    CMT2N<NA>
> >>>>> 17      AAV<NA>
> >>>>> 18    AAVS1       17
> >>>>> 19     ABAT       18
> >>>>> 20  GABA-AT<NA>
> >>>>> 21    GABAT<NA>
> >>>>> 22   NPD009<NA>
> >>>>> 23    ABC-1<NA>
> >>>>> 24     ABC1<NA>
> >>>>> 25    ABCA1       19
> >>>>>   >  select(org.Hs.eg.db, symb, "ENTREZID","ALIAS")
> >>>>>         ALIAS ENTREZID
> >>>>> 1      A1BG        1
> >>>>> 2       A2M        2
> >>>>> 3     A2MP1        3
> >>>>> 4      NAT1        9
> >>>>> 5      NAT1     1982
> >>>>> 6      NAT1     6530
> >>>>> 7      NAT1    10991
> >>>>> 8      NAT2       10
> >>>>> 9      NAT2    81539
> >>>>> 10     AACP       11
> >>>>> 11 SERPINA3       12
> >>>>> 12    AADAC       13
> >>>>> 13     AAMP       14
> >>>>> 14    AANAT       15
> >>>>> 15     DSPS       15
> >>>>> 16     SNAT       15
> >>>>> 17     AARS       16
> >>>>> 18    CMT2N       16
> >>>>> 19      AAV       17
> >>>>> 20    AAVS1       17
> >>>>> 21     ABAT       18
> >>>>> 22  GABA-AT       18
> >>>>> 23    GABAT       18
> >>>>> 24   NPD009       18
> >>>>> 25    ABC-1       19
> >>>>> 26     ABC1       19
> >>>>> 27     ABC1    63897
> >>>>> 28    ABCA1       19
> >>>>> Warning message:
> >>>>> In .generateExtraRows(tab, keys, jointype) :
> >>>>>     'select' and duplicate query keys resulted in 1:many mapping
> >>>>> between
> >>>>> keys and return rows
> >>>>>   >  mget(c("1982","6530","10991"), org.Hs.egGENENAME)
> >>>>> $`1982`
> >>>>> [1] "eukaryotic translation initiation factor 4 gamma, 2"
> >>>>>
> >>>>> $`6530`
> >>>>> [1] "solute carrier family 6 (neurotransmitter transporter,
> >>>>> noradrenalin), member 2"
> >>>>>
> >>>>> $`10991`
> >>>>> [1] "solute carrier family 38, member 3"
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Jim
> >>>>>
> >>>>>> On 25 July 2013 18:17, James W. MacDonald<jmacdon at uw.edu>   wrote:
> >>>>>>>
> >>>>>>> Hi Enrico,
> >>>>>>>
> >>>>>>>
> >>>>>>> On 7/25/2013 12:56 PM, Enrico Ferrero wrote:
> >>>>>>>>
> >>>>>>>> Dear James,
> >>>>>>>>
> >>>>>>>> Thanks very much for your prompt reply.
> >>>>>>>> I knew the problem was the for loop and the select function is
> >>>>>>>> indeed
> >>>>>>>> a lot faster than that and works perfectly with toy data.
> >>>>>>>>
> >>>>>>>> However, this is what happens when I try to use it with real data:
> >>>>>>>>
> >>>>>>>>> test<- select(org.Hs.eg.db, keys=df$GeneSymbol, keytype="ALIAS",
> >>>>>>>>> cols=c("SYMBOL","ENTREZID","ENSEMBL"))
> >>>>>>>>
> >>>>>>>> Warning message:
> >>>>>>>> In .generateExtraRows(tab, keys, jointype) :
> >>>>>>>>      'select' and duplicate query keys resulted in 1:many mapping
> >>>>>>>> between
> >>>>>>>> keys and return rows
> >>>>>>>>
> >>>>>>>> which is probably the warning you mentioned.
> >>>>>>>
> >>>>>>>
> >>>>>>> That's not the warning I mentioned, but it does point out the same
> >>>>>>> issue,
> >>>>>>> which is that there is a one to many mapping between symbol and
> >>>>>>> entrez gene
> >>>>>>> ID.
> >>>>>>>
> >>>>>>> So now you have to decide if you want to be naive (or stupid,
> >>>>>>> depending on
> >>>>>>> your perspective) or not. You could just cover your eyes and do this:
> >>>>>>>
> >>>>>>> first.two<- first.two[!duplicated(first.two$SYMBOL),]
> >>>>>>>
> >>>>>>> which will choose for you the first symbol ->   gene ID mapping and
> >>>>>>> nuke the
> >>>>>>> rest. That's nice and quick, but you are making huge assumptions.
> >>>>>>>
> >>>>>>> Or you could decide to be a bit more sophisticated and do
> >>>>>>> something like
> >>>>>>>
> >>>>>>> thelst<- tapply(1:nrow(first.two), first.two$SYMBOL, function(x)
> >>>>>>> first.two[x,])
> >>>>>>>
> >>>>>>> At this point you can take a look at e.g., thelst[1:10] to see what
> >>>>>>> we just
> >>>>>>> did
> >>>>>>>
> >>>>>>> thelst<- do.call("rbind", lapply(thelst, function(x) c(x[1,1],
> >>>>>>> paste(x[,2],
> >>>>>>> collapse = "|")))
> >>>>>>>
> >>>>>>> and here you can look at head(thelst).
> >>>>>>>
> >>>>>>> Then you can check to ensure that the first column of thelst is
> >>>>>>> identical to
> >>>>>>> the first column of df, and proceed as before.
> >>>>>>>
> >>>>>>> But there is still the problem of the multiple mappings. As an
> >>>>>>> example:
> >>>>>>>
> >>>>>>>> thelst[1:5]
> >>>>>>>
> >>>>>>> $HBD
> >>>>>>>        SYMBOL  ENTREZID
> >>>>>>> 2535    HBD      3045
> >>>>>>> 2536    HBD 100187828
> >>>>>>>
> >>>>>>> $KIR3DL3
> >>>>>>>          SYMBOL  ENTREZID
> >>>>>>> 17513 KIR3DL3    115653
> >>>>>>> 17514 KIR3DL3 100133046
> >>>>>>>
> >>>>>>>> mget(as.character(thelst[[1]][,2]), org.Hs.egGENENAME)
> >>>>>>>
> >>>>>>> $`3045`
> >>>>>>> [1] "hemoglobin, delta"
> >>>>>>>
> >>>>>>> $`100187828`
> >>>>>>> [1] "hypophosphatemic bone disease"
> >>>>>>>
> >>>>>>>> mget(as.character(thelst[[2]][,2]), org.Hs.egGENENAME)
> >>>>>>>
> >>>>>>> $`115653`
> >>>>>>> [1] "killer cell immunoglobulin-like receptor, three domains, long
> >>>>>>> cytoplasmic tail, 3"
> >>>>>>>
> >>>>>>> $`100133046`
> >>>>>>> [1] "killer cell immunoglobulin-like receptor three domains long
> >>>>>>> cytoplasmic
> >>>>>>> tail 3"
> >>>>>>>
> >>>>>>>
> >>>>>>> So HBD is the gene symbol for two different genes! If this gene
> >>>>>>> symbol is in
> >>>>>>> your data, you will now have attributed your data to two genes that
> >>>>>>> apparently are not remotely similar. if KIR3DL3 is in your data,
> >>>>>>> then it
> >>>>>>> worked out OK for that gene.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Jim
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> The real problem is that the number of rows is now different for
> >>>>>>>> the 2
> >>>>>>>> objects:
> >>>>>>>>>
> >>>>>>>>> nrow(df); nrow(test)
> >>>>>>>>
> >>>>>>>> [1] 573
> >>>>>>>> [1] 201
> >>>>>>>>
> >>>>>>>> So I obviously can't put the new data into the original df. My
> >>>>>>>> impression is that when the 1 to many mapping arises, the select
> >>>>>>>> functions exits, with that warning message. As a result, my test
> >>>>>>>> object is incomplete.
> >>>>>>>>
> >>>>>>>> On top of that, and I can't really explain this, the row
> >>>>>>>> positions are
> >>>>>>>> messed up, e.g.
> >>>>>>>>
> >>>>>>>>> all.equal(df[100,],test[100,])
> >>>>>>>>
> >>>>>>>> returns FALSE.
> >>>>>>>>
> >>>>>>>> How can I work around this?
> >>>>>>>>
> >>>>>>>> Thanks a  lot!
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> On 25 July 2013 16:58, James W. MacDonald<jmacdon at uw.edu>    wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Enrico,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 7/25/2013 11:35 AM, Enrico Ferrero wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hello,
> >>>>>>>>>>
> >>>>>>>>>> I often have data frames where I need to perform ID conversions on
> >>>>>>>>>> one
> >>>>>>>>>> or
> >>>>>>>>>> more of the columns while preserving the order of the rows, e.g.:
> >>>>>>>>>>
> >>>>>>>>>> GeneSymbol    Value1    Value2
> >>>>>>>>>> GS1    2.5    0.1
> >>>>>>>>>> GS2    3    0.2
> >>>>>>>>>> ..
> >>>>>>>>>>
> >>>>>>>>>> And I want to obtain:
> >>>>>>>>>>
> >>>>>>>>>> GeneSymbol    EntrezGeneID    Value1    Value2
> >>>>>>>>>> GS1    EG1    2.5    0.1
> >>>>>>>>>> GS2    EG2    3    0.2
> >>>>>>>>>> ..
> >>>>>>>>>>
> >>>>>>>>>> What I've done so far was to create a function that uses
> >>>>>>>>>> org.Hs.eg.db to
> >>>>>>>>>> loop over the rows of the column and does the conversion:
> >>>>>>>>>>
> >>>>>>>>>> library(org.Hs.eg.db)
> >>>>>>>>>> alias2EG<- function(x) {
> >>>>>>>>>> for (i in 1:length(x)) {
> >>>>>>>>>> if (!is.na(x[i])) {
> >>>>>>>>>> repl<- org.Hs.egALIAS2EG[[x[i]]][1]
> >>>>>>>>>> if (!is.null(repl)) {
> >>>>>>>>>> x[i]<- repl
> >>>>>>>>>> }
> >>>>>>>>>> else {
> >>>>>>>>>> x[i]<- NA
> >>>>>>>>>> }
> >>>>>>>>>> }
> >>>>>>>>>> }
> >>>>>>>>>> return(x)
> >>>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I should first note that gene symbols are not unique, so you are
> >>>>>>>>> taking a
> >>>>>>>>> chance on your mappings. Is there no other annotation for your
> >>>>>>>>> data?
> >>>>>>>>>
> >>>>>>>>> In addition, you should note that it is almost always better to
> >>>>>>>>> think of
> >>>>>>>>> objects as vectors and matrices in R, rather than as things that
> >>>>>>>>> need to
> >>>>>>>>> be
> >>>>>>>>> looped over (e.g., R isn't Perl or C).
> >>>>>>>>>
> >>>>>>>>> first.two<- select(org.Hs.eg.db, as.character(df$GeneSymbol),
> >>>>>>>>> "ENTREZID",
> >>>>>>>>> "SYMBOL")
> >>>>>>>>>
> >>>>>>>>> Note that there used to be a warning or an error (don't remember
> >>>>>>>>> which)
> >>>>>>>>> when
> >>>>>>>>> you did something like this, stating that gene symbols are not
> >>>>>>>>> unique,
> >>>>>>>>> and
> >>>>>>>>> that you shouldn't do this sort of thing. Apparently this
> >>>>>>>>> warning has
> >>>>>>>>> been
> >>>>>>>>> removed, but the issue remains valid.
> >>>>>>>>>
> >>>>>>>>> ## check yourself
> >>>>>>>>>
> >>>>>>>>> all.equal(df$GeneSymbol, first.two$SYMBOL)
> >>>>>>>>>
> >>>>>>>>> ## if true, proceed
> >>>>>>>>>
> >>>>>>>>> df<- data.frame(first.two, df[,-1])
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Jim
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> and then call the function like this:
> >>>>>>>>>>
> >>>>>>>>>> df$EntrezGeneID<- alias2GS(df$GeneSymbol)
> >>>>>>>>>>
> >>>>>>>>>> This works well, but gets very slow when I need to do multiple
> >>>>>>>>>> conversions
> >>>>>>>>>> on large datasets.
> >>>>>>>>>>
> >>>>>>>>>> Is there any way I can achieve the same result but in a
> >>>>>>>>>> quicker, more
> >>>>>>>>>> efficient way?
> >>>>>>>>>>
> >>>>>>>>>> Thank you.
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> James W. MacDonald, M.S.
> >>>>>>>>> Biostatistician
> >>>>>>>>> University of Washington
> >>>>>>>>> Environmental and Occupational Health Sciences
> >>>>>>>>> 4225 Roosevelt Way NE, # 100
> >>>>>>>>> Seattle WA 98105-6099
> >>>>>>>>>
> >>>>>>> --
> >>>>>>> James W. MacDonald, M.S.
> >>>>>>> Biostatistician
> >>>>>>> University of Washington
> >>>>>>> Environmental and Occupational Health Sciences
> >>>>>>> 4225 Roosevelt Way NE, # 100
> >>>>>>> Seattle WA 98105-6099
> >>>>>>>
> >>>>>>
> >>>> --
> >>>> Hervé Pagès
> >>>>
> >>>> Program in Computational Biology
> >>>> Division of Public Health Sciences
> >>>> Fred Hutchinson Cancer Research Center
> >>>> 1100 Fairview Ave. N, M1-B514
> >>>> P.O. Box 19024
> >>>> Seattle, WA 98109-1024
> >>>>
> >>>> E-mail: hpages at fhcrc.org
> >>>> Phone:  (206) 667-5791
> >>>> Fax:    (206) 667-1319
> >>>
> >>>
> >>>
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fhcrc.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
> 
> 
> 
> -- 
> Enrico Ferrero
> PhD Student
> Steve Russell Lab - Department of Genetics
> FlyChip - Cambridge Systems Biology Centre
> University of Cambridge
> 
> e.ferrero at gen.cam.ac.uk
> http://flypress.gen.cam.ac.uk/
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor