[BioC] upload multiple genelists at a time for GO or pathway enrichment

Pekka Kohonen pkpekka at gmail.com
Mon Oct 28 15:54:37 CET 2013


Hello Helen,

If you can carry out one GO enrichment analysis in R, you can carry
out 97 almost equally well. You just have to have a script for doing
it once. Then you generate a list of you gene sets. And then you can
use R: lapply on the list. the syntax is:

listofresultmatrices <- lapply(listofgenesets, function(s) {

your entire script for doing analysis on one genelist

})

You script should produce a matrix as an output and you would have a
list of 97 results matrices. You can then script some sort of a
summarization function.

I have done this for topGO analysis: you can search
"label:bioconductor topGO Pekka". It uses lists as gene symbols, a
vector of ensembl gene ids for the background and GO "Biological
Process" categories. TopGO is a nice package because it avoids the
problem of GO-enrichment (and all hypergeometric enrichment analysis)
that you always get the least informative largest genesets as the
top-results. Probably because the method does not properly take into
account inter-gene correlations and does not weight results according
to significance of differential expression (how could it since it only
has information on gene names).

Best, Pekka

2013/10/28 Helen Smith <helen.smith-2 at manchester.ac.uk>:
> Hi All,
>
> I have a list of clusters generated through biolayout which I need to annotate for GO annotations.  I can do this one-by-one using DAVID etc but as there are 97 clusters this is a rather lengthy procedure.
> Do any of you know of any tools were you can upload multiple lists of genes to be annotated for go over-representation (or failing that pathway enrichment)?
>
> Any help would be greatly appreciated,
> Helen
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of James W. MacDonald
> Sent: 28 October 2013 13:50
> To: DR DHANJIT KUMAR DAS [guest]
> Cc: bioconductor at r-project.org; dasdjk at gmail.com
> Subject: Re: [BioC] Installing library package (Marmoset Gene 1.0 ST Array) into R environment
>
> You should be using either the oligo or xps package for this array. For oligo, you need the pdInfoBuilder package, and some files from Affymetrix.
>
> You need the pgf, clf and mps files that are in this zipfile:
>
> http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/MarGene-1_0-st_rev01/MarGene-1_0-st_rev01.zip
>
> And you need the probeset csv file
>
> http://www.affymetrix.com/Auth/analysis/downloads/na33/wtgene-32_2/MarGene-1_0-st-v1.na33.2.caljac3.probeset.csv.zip
>
> and the transcript csv file
>
> http://www.affymetrix.com/Auth/analysis/downloads/na33/wtgene-32_2/MarGene-1_0-st-v1.na33.2.caljac3.transcript.csv.zip
>
> and then you can make the pd.margene.1.0.st.v1 package following these
> instructions:
>
> https://stat.ethz.ch/pipermail/bioconductor/2013-March/051335.html
>
> after which you can install using
>
> install.packages("pd.margene.1.0.st.v1/", repos=NULL, type="source")
>
> If you want to use xps there are a set of vignettes here:
>
> http://www.bioconductor.org/packages/release/bioc/html/xps.html
>
> and Christian Stratowa is very helpful, so you can ask questions here if you get stuck.
>
> Best,
>
> Jim
>
>
>
>
> On Monday, October 28, 2013 6:12:15 AM, DR DHANJIT KUMAR DAS [guest]
> wrote:
>>
>> I would like to analyze data using Marmoset Gene 1.0 ST Array. While reading the CEL file into R environment, error message is showing as "Library - package margene10stcdf not installed"
>>
>> How to install the package MarGene-1_0st package into R environment? The R script is attached for your reference.
>>
>> The part No of the array is 901961.
>>
>>   -- output of sessionInfo():
>>
>>> library(affy)
>>> eset.rma <- justRMA(celfile.path="C:/Dhanjit/Marmoset-Dr Uddhav/CEL
>>> FILE/")
>> Error in getCdfInfo(object) :
>>    Could not obtain CDF environment, problems encountered:
>> Specified environment does not contain MarGene-1_0-st Library -
>> package margene10stcdf not installed Bioconductor - margene10stcdf not
>> available
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list