[BioC] Annotation.db: how automatically call a mapping?

Tue Jun 30 22:28:46 CEST 2009

Hooiveld, Guido wrote:
> Hi Jim,
> Since your suggestion looks indeed easy and seems to provide everything
> I would like to have, a gave it a try (I'll have a further look at
> Martin's comments/suggestions later).
> However, it seems that 'probes2table' cannot properly handle multiple
> contrast defined in limma:
> 
>> library(affycoretools)
> Loading required package: GO.db
> Loading required package: DBI
> Loading required package: KEGG.db
>> probes2table(eset, featureNames(eset), annotation(eset),
> list("p-value"= fit2$p.value, "mean" = fit2$Amean), html = FALSE,
> filename = "output")
> Loading required package: moe430a.db
> Error in aafTable(items = otherdata) : 
>   All columns must be of equal length
>> length(fit2$p.value)
> [1] 68070
>> length(fit2$Amean)
> [1] 22690
> 
> I analyzed three contrasts in limma, and 68070/3 is 22690, which exactly
> equals the number of probesets on the Affy MOE430A array. This thus
> explains the error.
> 
> Question: can this easily be solved? Can limma2annaffy handle multiple
> contrasts? (at the moment I am not familiar with affycoretools at all).

limma2annaffy(eset, fit2, <designmatrixname>, <contrastsmatrixname>, 
annotation(eset), pfilt=1, html = FALSE, interactive = FALSE)

should give you text tables for each contrast. I would normally use a 
reasonable p-value and possibly a fold filter to reduce the output to 
the reporters of interest, and use html = TRUE because my end users in 
general like that better than the text. However, for all reporters on 
this chip the HTML table is too huge to be of use.

> 
> Thanks,
> Guido  
> 
>  
> 
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch 
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
>> James W. MacDonald
>> Sent: 30 June 2009 19:44
>> To: Hooiveld, Guido
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] Annotation.db: how automatically call a mapping?
>>
>> Hi Guido,
>>
>> Hooiveld, Guido wrote:
>>> Hi Martin,
>>>
>>> Indeed, another useful, straigh-forward possibility for mapping. 
>>> However, I am now facing the problem of properly combining the 
>>> annotation info with the expression data. This is what I 
>> would like to
>>> do:
>>>
>>>> Tab_data <- exprs(eset[probeids])
>>>> Tab_data <- cbind(Tab_data, fit2$Amean) # to add average 
>> expression 
>>>> of
>>> LIMMA output
>>>> Tab_data <- cbind(Tab_data, fit2$p.value) # to add p-value of LIMMA
>>> output
>>> etc.
>>>
>>> This al goes fine, however adding the annotation info 
>> 'mixes-up' the 
>>> content of Tab_data; the annotation data replaces the first 
>> column of 
>>> Tab_data, and the content of all cells is replaced by 'null'. I 
>>> suspect it has something to do with the type of object I 
>> would like to 
>>> merge, but I am not sure.
>>>
>>>> map.entrez <- getAnnMap("ENTREZID", annotation(eset)) 
>> map.entrez <- 
>>>> as.list(map.entrez[probeids])
>> This sort of thing is going to get really difficult to do by 
>> hand when you get to things that have a one-to-many 
>> relationship. And you are already duplicating existing 
>> efforts with what you have done so far.
>>
>> If you want to combine annotation data with results data, you 
>> really want to be using the annaffy package which does lots 
>> of these things seamlessly. And if you want things to be a 
>> bit easier, you could consider using the affycoretools 
>> package as well, which for the most part uses annaffy to 
>> create output.
>>
>> You can do what you appear to want in one line:
>>
>> probes2table(eset, featureNames(eset), annotation(eset), 
>> list("p-value"= fit2$p.value, "mean" = fit2$Amean), html = 
>> FALSE, filename = "output")
>>
>> You might need to play around with the 'anncols' argument to 
>> get what annotation data you might want.
>>
>> If you want output specific to the contrasts you have fit, 
>> see ?limma2annaffy.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>>> Tab_data <- cbind(Tab_data, map.entrez)
>>>   ^ in R this seems to work, but when saved as .txt the content of 
>>> Tab_data is completely mixed up. Before 'adding' map.entrez 
>> Tab_dat is 
>>> OK.
>>>
>>>
>>>> write.table(cbind(rownames(Tab_data2), Tab_data2),
>>> file="test_1234.txt", sep="\t", col.names=TRUE, row.names=FALSE)
>>>
>>>> class(Tab_data)
>>> [1] "matrix"
>>>> class(map.entrez)
>>> [1] "list"
>>>
>>>
>>> Do you, or someone elsr, have a suggestion how to properly 
>> link these 
>>> two types of data?
>>> Thanks again,
>>> Guido
>>>
>>>
>>>
>>>  
>>>
>>>> -----Original Message-----
>>>> From: bioconductor-bounces at stat.math.ethz.ch
>>>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf 
>> Of Martin 
>>>> Morgan
>>>> Sent: 30 June 2009 00:00
>>>> To: Hooiveld, Guido
>>>> Cc: bioconductor at stat.math.ethz.ch
>>>> Subject: Re: [BioC] Annotation.db: how automatically call 
>> a mapping?
>>>> Hooiveld, Guido wrote:
>>>>> Hi,
>>>>>  
>>>>> I am facing a problem i cannot solve myselves, despite 
>> everything i 
>>>>> read/know. But i assume the solution is easy for the more
>>>> knowledgable
>>>>> folks in BioC/R...
>>>>>  
>>>>> This does work:
>>>>>> library(moe430a.db)
>>>>>> xxyy <- moe430aSYMBOL
>>>>>> xxyy
>>>>> SYMBOL map for chip moe430a (object of class "AnnDbBimap")
>>>>>  
>>>>> However, for this to work you need to know the array type
>>>> of the data
>>>>> that is analyzed.
>>>>>  
>>>>>  
>>>>> Now i would like to automatically extract the (e.g.) 
>> SYMBOL mapping 
>>>>> from an annotation.db, thus by retrieving the array type
>>>> from the eset.
>>>>>  
>>>>>> library(affy)
>>>>>> eset <- rma(data)
>>>>>> probeids <- featureNames(eset)
>>>>>> annotation(eset)
>>>>> [1] "moe430a"
>>>>>  
>>>>> But how can i use this info to properly call the SYMBOL mapping?
>>>> Hi Guido --
>>>>
>>>> to get the appropriate map
>>>>
>>>>   library(annotate)
>>>>   map = getAnnMap("SYMBOL", annotation(eset))
>>>>
>>>> to select just the relevant probes
>>>>
>>>>   map[probeids]
>>>>
>>>> toTable(map[probeids]) or as.list(map[probeids]) might be the next 
>>>> step in the work flow.
>>>>
>>>> Martin
>>>>
>>>>>  
>>>>> I tried this:
>>>>>> arraytype <- annotation(eset)
>>>>>> arraytype <- paste(arraytype, "db", sep = ".") arraytype
>>>>> [1] "moe430a.db"
>>>>>> arraytype <- paste("package", arraytype, sep = ":") gh <-
>>>>>> ls(arraytype) gh
>>>>>  [1] "moe430a"              "moe430a_dbconn"       
>> "moe430a_dbfile"
>>>>> "moe430a_dbInfo"       "moe430a_dbschema"     "moe430aACCNUM"
>>>>> "moe430aALIAS2PROBE"   "moe430aCHR"           "moe430aCHRLENGTHS"
>>>>> "moe430aCHRLOC"       
>>>>> [11] "moe430aCHRLOCEND"     "moe430aENSEMBL"
>>>>> "moe430aENSEMBL2PROBE" "moe430aENTREZID"      "moe430aENZYME"
>>>>> "moe430aENZYME2PROBE"  "moe430aGENENAME"      "moe430aGO"
>>>>> "moe430aGO2ALLPROBES"  "moe430aGO2PROBE"     
>>>>> [21] "moe430aMAP"           "moe430aMAPCOUNTS"     "moe430aMGI"
>>>>> "moe430aMGI2PROBE"     "moe430aORGANISM"      "moe430aPATH"
>>>>> "moe430aPATH2PROBE"    "moe430aPFAM"          "moe430aPMID"
>>>>> "moe430aPMID2PROBE"   
>>>>> [31] "moe430aPROSITE"       "moe430aREFSEQ"        "moe430aSYMBOL"
>>>>> "moe430aUNIGENE"       "moe430aUNIPROT"
>>>>>  
>>>>>> gh[33]
>>>>> [1] "moe430aSYMBOL"
>>>>>> symbols <- mget(probeids, gh[33])
>>>>> Error in mget(probeids, gh[33]) : second argument must be an 
>>>>> environment
>>>>>  
>>>>> This also doesn't work:
>>>>>> symbols <- mget(probeids, envir=gh[33])
>>>>> Error in mget(probeids, envir = gh[33]) : 
>>>>>   second argument must be an environment
>>>>>  
>>>>> My approach thus is the wrong approach to automatically extract 
>>>>> mappings from a annotation.db.
>>>>> Since i don't know about any other possibility, i would
>>>> appreciate if
>>>>> someone could point me to a working solution.
>>>>>  
>>>>> Thanks,
>>>>> Guido
>>>>>  
>>>>>
>>>>> ------------------------------------------------
>>>>> Guido Hooiveld, PhD
>>>>> Nutrition, Metabolism & Genomics Group Division of Human 
>> Nutrition 
>>>>> Wageningen University Biotechnion, Bomenweg 2
>>>>> NL-6703 HD Wageningen
>>>>> the Netherlands
>>>>> tel: (+)31 317 485788
>>>>> fax: (+)31 317 483342 
>>>>> internet:   http://nutrigene.4t.com <http://nutrigene.4t.com/>  
>>>>> email:      guido.hooiveld at wur.nl 
>>>>>
>>>>>
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: 
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826