[BioC] Annotation.db: how automatically call a mapping?

James W. MacDonald jmacdon at med.umich.edu
Wed Jul 1 15:14:54 CEST 2009



Hooiveld, Guido wrote:
> Hi again,
> 
> Some remarks/questions:
> 
> First of all, thanks for all your help!

No problem. But do note that I am promoting my own package ;-D

> 
> Meanwhile I got limma2annaffy to work. Nice HTML output! However some
> minor things: 
> - I experienced that the library 'annaffy' is not automatically loaded.

No. For BioC 2.4 we made a concerted effort to reduce the search path 
that a given package creates by importing functions from required 
packages rather than loading them all into the search path (and 
affycoretools was loading _lots_ of packages).

This is a double-edged sword; the imported package isn't loaded, so the 
end user doesn't have access to the package functions, and may well have 
to load the package anyway. But it is better to allow the end user to 
clutter his/her search path to whatever extent desired, rather than 
doing it for them.

> - how to get the scientific notation for the p-values (e.g. 1.24E-12;
> for all 30 genes it is now "0").

You can't, using limma2annaffy(). Right now it is hard-coded to round 
p-values to three decimal places. So anything less than 1e-3 will be 
rounded to zero. You could do so using probes2table(), but you will have 
to do some of the leg work to set up the data to output.

> - in the output file, the header 'Fold Change' actually reflects
> 'log2FC'.

True.

> 
> 
> Regarding the function probes2table, how to access the results [...html
> = FALSE, filename = "output")]? 
> Is it (=output) automatically saved? If so, where?; it is not in my
> working directory.

Oh yeah, I forgot. The default is text = FALSE as well, so you need to 
set either html = TRUE or text = TRUE (or both) to get output.

Best,

Jim



> 
> Thanks,
> Guido
> 
> 
>> limma2annaffy(eset, fit2, design, cont.matrix, "moe430a", adjust =
> "fdr", 
> anncols = aaf.handler()[c(1:3, 6:7, 9:12)], number = 30, pfilt = NULL, 
> fldfilt= NULL,tstat = TRUE, pval = TRUE, FC = TRUE, expression = TRUE, 
> html = TRUE, text = FALSE, save = FALSE, addname = NULL, interactive =
> TRUE)
> 
> You are going to output 3 tables, 
> With this many genes in each:
>  30 30 30
> Do you want to accept or change these values? [ a/c ]a
> Error in as.vector(x) : could not find function "aaf.handler"
>> library(annaffy)
>> limma2annaffy(eset, fit2, design, cont.matrix, "moe430a", adjust =
> "fdr", 
> anncols = aaf.handler()[c(1:3, 6:7, 9:12)], number = 30, pfilt = NULL, 
> fldfilt= NULL,tstat = TRUE, pval = TRUE, FC = TRUE, expression = TRUE, 
> html = TRUE, text = FALSE, save = FALSE, addname = NULL, interactive =
> TRUE)
> 
> You are going to output 3 tables, 
> With this many genes in each:
>  30 30 30
> Do you want to accept or change these values? [ a/c ]a
> 
> 
>  
> 
> 
>> -----Original Message-----
>> From: James W. MacDonald [mailto:jmacdon at med.umich.edu] 
>> Sent: 30 June 2009 22:29
>> To: Hooiveld, Guido
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] Annotation.db: how automatically call a mapping?
>>
>>
>>
>> Hooiveld, Guido wrote:
>>> Hi Jim,
>>> Since your suggestion looks indeed easy and seems to provide 
>>> everything I would like to have, a gave it a try (I'll have 
>> a further 
>>> look at Martin's comments/suggestions later).
>>> However, it seems that 'probes2table' cannot properly 
>> handle multiple 
>>> contrast defined in limma:
>>>
>>>> library(affycoretools)
>>> Loading required package: GO.db
>>> Loading required package: DBI
>>> Loading required package: KEGG.db
>>>> probes2table(eset, featureNames(eset), annotation(eset),
>>> list("p-value"= fit2$p.value, "mean" = fit2$Amean), html = FALSE, 
>>> filename = "output") Loading required package: moe430a.db Error in 
>>> aafTable(items = otherdata) :
>>>   All columns must be of equal length
>>>> length(fit2$p.value)
>>> [1] 68070
>>>> length(fit2$Amean)
>>> [1] 22690
>>>
>>> I analyzed three contrasts in limma, and 68070/3 is 22690, which 
>>> exactly equals the number of probesets on the Affy MOE430A 
>> array. This 
>>> thus explains the error.
>>>
>>> Question: can this easily be solved? Can limma2annaffy 
>> handle multiple 
>>> contrasts? (at the moment I am not familiar with 
>> affycoretools at all).
>>
>> limma2annaffy(eset, fit2, <designmatrixname>, 
>> <contrastsmatrixname>, annotation(eset), pfilt=1, html = 
>> FALSE, interactive = FALSE)
>>
>> should give you text tables for each contrast. I would 
>> normally use a reasonable p-value and possibly a fold filter 
>> to reduce the output to the reporters of interest, and use 
>> html = TRUE because my end users in general like that better 
>> than the text. However, for all reporters on this chip the 
>> HTML table is too huge to be of use.
>>
>>> Thanks,
>>> Guido
>>>
>>>  
>>>
>>>> -----Original Message-----
>>>> From: bioconductor-bounces at stat.math.ethz.ch
>>>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf 
>> Of James W. 
>>>> MacDonald
>>>> Sent: 30 June 2009 19:44
>>>> To: Hooiveld, Guido
>>>> Cc: bioconductor at stat.math.ethz.ch
>>>> Subject: Re: [BioC] Annotation.db: how automatically call 
>> a mapping?
>>>> Hi Guido,
>>>>
>>>> Hooiveld, Guido wrote:
>>>>> Hi Martin,
>>>>>
>>>>> Indeed, another useful, straigh-forward possibility for mapping. 
>>>>> However, I am now facing the problem of properly combining the 
>>>>> annotation info with the expression data. This is what I
>>>> would like to
>>>>> do:
>>>>>
>>>>>> Tab_data <- exprs(eset[probeids])
>>>>>> Tab_data <- cbind(Tab_data, fit2$Amean) # to add average
>>>> expression
>>>>>> of
>>>>> LIMMA output
>>>>>> Tab_data <- cbind(Tab_data, fit2$p.value) # to add 
>> p-value of LIMMA
>>>>> output
>>>>> etc.
>>>>>
>>>>> This al goes fine, however adding the annotation info
>>>> 'mixes-up' the
>>>>> content of Tab_data; the annotation data replaces the first
>>>> column of
>>>>> Tab_data, and the content of all cells is replaced by 'null'. I 
>>>>> suspect it has something to do with the type of object I
>>>> would like to
>>>>> merge, but I am not sure.
>>>>>
>>>>>> map.entrez <- getAnnMap("ENTREZID", annotation(eset))
>>>> map.entrez <-
>>>>>> as.list(map.entrez[probeids])
>>>> This sort of thing is going to get really difficult to do by hand 
>>>> when you get to things that have a one-to-many 
>> relationship. And you 
>>>> are already duplicating existing efforts with what you 
>> have done so 
>>>> far.
>>>>
>>>> If you want to combine annotation data with results data, 
>> you really 
>>>> want to be using the annaffy package which does lots of 
>> these things 
>>>> seamlessly. And if you want things to be a bit easier, you could 
>>>> consider using the affycoretools package as well, which 
>> for the most 
>>>> part uses annaffy to create output.
>>>>
>>>> You can do what you appear to want in one line:
>>>>
>>>> probes2table(eset, featureNames(eset), annotation(eset), 
>>>> list("p-value"= fit2$p.value, "mean" = fit2$Amean), html = FALSE, 
>>>> filename = "output")
>>>>
>>>> You might need to play around with the 'anncols' argument 
>> to get what 
>>>> annotation data you might want.
>>>>
>>>> If you want output specific to the contrasts you have fit, see 
>>>> ?limma2annaffy.
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>>> Tab_data <- cbind(Tab_data, map.entrez)
>>>>>   ^ in R this seems to work, but when saved as .txt the 
>> content of 
>>>>> Tab_data is completely mixed up. Before 'adding' map.entrez
>>>> Tab_dat is
>>>>> OK.
>>>>>
>>>>>
>>>>>> write.table(cbind(rownames(Tab_data2), Tab_data2),
>>>>> file="test_1234.txt", sep="\t", col.names=TRUE, row.names=FALSE)
>>>>>
>>>>>> class(Tab_data)
>>>>> [1] "matrix"
>>>>>> class(map.entrez)
>>>>> [1] "list"
>>>>>
>>>>>
>>>>> Do you, or someone elsr, have a suggestion how to properly
>>>> link these
>>>>> two types of data?
>>>>> Thanks again,
>>>>> Guido
>>>>>
>>>>>
>>>>>
>>>>>  
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioconductor-bounces at stat.math.ethz.ch
>>>>>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf
>>>> Of Martin
>>>>>> Morgan
>>>>>> Sent: 30 June 2009 00:00
>>>>>> To: Hooiveld, Guido
>>>>>> Cc: bioconductor at stat.math.ethz.ch
>>>>>> Subject: Re: [BioC] Annotation.db: how automatically call
>>>> a mapping?
>>>>>> Hooiveld, Guido wrote:
>>>>>>> Hi,
>>>>>>>  
>>>>>>> I am facing a problem i cannot solve myselves, despite
>>>> everything i
>>>>>>> read/know. But i assume the solution is easy for the more
>>>>>> knowledgable
>>>>>>> folks in BioC/R...
>>>>>>>  
>>>>>>> This does work:
>>>>>>>> library(moe430a.db)
>>>>>>>> xxyy <- moe430aSYMBOL
>>>>>>>> xxyy
>>>>>>> SYMBOL map for chip moe430a (object of class "AnnDbBimap")
>>>>>>>  
>>>>>>> However, for this to work you need to know the array type
>>>>>> of the data
>>>>>>> that is analyzed.
>>>>>>>  
>>>>>>>  
>>>>>>> Now i would like to automatically extract the (e.g.)
>>>> SYMBOL mapping
>>>>>>> from an annotation.db, thus by retrieving the array type
>>>>>> from the eset.
>>>>>>>  
>>>>>>>> library(affy)
>>>>>>>> eset <- rma(data)
>>>>>>>> probeids <- featureNames(eset)
>>>>>>>> annotation(eset)
>>>>>>> [1] "moe430a"
>>>>>>>  
>>>>>>> But how can i use this info to properly call the SYMBOL mapping?
>>>>>> Hi Guido --
>>>>>>
>>>>>> to get the appropriate map
>>>>>>
>>>>>>   library(annotate)
>>>>>>   map = getAnnMap("SYMBOL", annotation(eset))
>>>>>>
>>>>>> to select just the relevant probes
>>>>>>
>>>>>>   map[probeids]
>>>>>>
>>>>>> toTable(map[probeids]) or as.list(map[probeids]) might 
>> be the next 
>>>>>> step in the work flow.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>>  
>>>>>>> I tried this:
>>>>>>>> arraytype <- annotation(eset)
>>>>>>>> arraytype <- paste(arraytype, "db", sep = ".") arraytype
>>>>>>> [1] "moe430a.db"
>>>>>>>> arraytype <- paste("package", arraytype, sep = ":") gh <-
>>>>>>>> ls(arraytype) gh
>>>>>>>  [1] "moe430a"              "moe430a_dbconn"       
>>>> "moe430a_dbfile"
>>>>>>> "moe430a_dbInfo"       "moe430a_dbschema"     "moe430aACCNUM"
>>>>>>> "moe430aALIAS2PROBE"   "moe430aCHR"           
>> "moe430aCHRLENGTHS"
>>>>>>> "moe430aCHRLOC"       
>>>>>>> [11] "moe430aCHRLOCEND"     "moe430aENSEMBL"
>>>>>>> "moe430aENSEMBL2PROBE" "moe430aENTREZID"      "moe430aENZYME"
>>>>>>> "moe430aENZYME2PROBE"  "moe430aGENENAME"      "moe430aGO"
>>>>>>> "moe430aGO2ALLPROBES"  "moe430aGO2PROBE"     
>>>>>>> [21] "moe430aMAP"           "moe430aMAPCOUNTS"     "moe430aMGI"
>>>>>>> "moe430aMGI2PROBE"     "moe430aORGANISM"      "moe430aPATH"
>>>>>>> "moe430aPATH2PROBE"    "moe430aPFAM"          "moe430aPMID"
>>>>>>> "moe430aPMID2PROBE"   
>>>>>>> [31] "moe430aPROSITE"       "moe430aREFSEQ"        
>> "moe430aSYMBOL"
>>>>>>> "moe430aUNIGENE"       "moe430aUNIPROT"
>>>>>>>  
>>>>>>>> gh[33]
>>>>>>> [1] "moe430aSYMBOL"
>>>>>>>> symbols <- mget(probeids, gh[33])
>>>>>>> Error in mget(probeids, gh[33]) : second argument must be an 
>>>>>>> environment
>>>>>>>  
>>>>>>> This also doesn't work:
>>>>>>>> symbols <- mget(probeids, envir=gh[33])
>>>>>>> Error in mget(probeids, envir = gh[33]) : 
>>>>>>>   second argument must be an environment
>>>>>>>  
>>>>>>> My approach thus is the wrong approach to automatically extract 
>>>>>>> mappings from a annotation.db.
>>>>>>> Since i don't know about any other possibility, i would
>>>>>> appreciate if
>>>>>>> someone could point me to a working solution.
>>>>>>>  
>>>>>>> Thanks,
>>>>>>> Guido
>>>>>>>  
>>>>>>>
>>>>>>> ------------------------------------------------
>>>>>>> Guido Hooiveld, PhD
>>>>>>> Nutrition, Metabolism & Genomics Group Division of Human
>>>> Nutrition
>>>>>>> Wageningen University Biotechnion, Bomenweg 2
>>>>>>> NL-6703 HD Wageningen
>>>>>>> the Netherlands
>>>>>>> tel: (+)31 317 485788
>>>>>>> fax: (+)31 317 483342 
>>>>>>> internet:   http://nutrigene.4t.com <http://nutrigene.4t.com/>  
>>>>>>> email:      guido.hooiveld at wur.nl 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 	[[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives: 
>>>>>>>
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives: 
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: 
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> Douglas Lab
>>>> University of Michigan
>>>> Department of Human Genetics
>>>> 5912 Buhl
>>>> 1241 E. Catherine St.
>>>> Ann Arbor MI 48109-5618
>>>> 734-615-7826
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>>
>>
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list