[BioC] filtering on external genelist

Martin Morgan mtmorgan at fhcrc.org
Tue Sep 11 23:47:37 CEST 2007


"James W. MacDonald" <jmacdon at med.umich.edu> writes:

> D wrote:
>> Oleg Moskvin <ovm at ...> writes:
>> 
>>> Colleagues,
>>>
>>> I think this should be pretty simple task but I cannot find an appropriate 
>>> package for that.
>>> I need to generate a subset of eSet object which contains certain probesets 
>>> indicated in an external genelist (outside R environment).
>>>
>>> I.e. this procedure should look like this:
>>>
>>> mylist <- read.table .....
>>> fltered.eset <- someFunction(eSet, mylist)
>>>
>>> Probably this is already implemented somewhere.
>>> Any hints will be appreciated.
>>>
>>> All the best,
>>>
>>> Oleg
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at ...
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>> 
>> 
>> I have the exact same question.  I am working with 2-color data in limma
>> however.  I'd like to be able to make a table of Mvalues corresponding to a list
>> of geneIDs from an external table.  Any help is appreciated.
>
> That is not the same question, really. Your question should be easily 
> answered by reading 'An Introduction to R', as that is a simple 
> subsetting problem.

Maybe helpful to know that MALists contain or can be made to contain
(e.g., when reading in the original data files) whatever information
the manufacturer might provide in terms of additional annotations. You
might then do something like (the details depend entirely on how the
MAList object was created)

> idx <- ma$genes$Labels %in% c("EST1", "Actin")
> ma1 <- ma[idx,]

where this creates a (logical) index and then uses it for subsetting.

> The answer to the original question is also pretty simple. I don't know 
> if this is documented somewhere, but I think the principle of least 
> surprise applies here:
>
> mylist <- read.table("my_external_list")
> filtered.eset <- original.eset[mylist,]
>
> As an example:
>
>  > library(fibroEset)
>  > data(fibroEset)
>  > thenames <- featureNames(fibroEset)[sample(1:12625, 300)]
>  > subsetted.eset <- fibroEset[thenames,]
>  > subsetted.eset
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 300 features, 46 samples
>    element names: exprs
> phenoData
>    sampleNames: 1, 2, ..., 46 (46 total)
>    varLabels and varMetadata:
>      samp: sample code
>      species: h: human, b: bonobo, g: gorilla
> featureData
>    rowNames: 37599_at, 34494_at, ..., 36333_at (300 total)
>    varLabels and varMetadata: none
> experimentData: use 'experimentData(object)'
>    pubMedIds: 12840040
> Annotation [1] "hgu95av2"

A bit trickier when thenames are not probesets. One can use the maps
in the annotation package to get there, though, e.g., from SYMBOL:

> library(hgu95av2)
> rmap <- l2e(reverseSplit(as.list(hgu95av2SYMBOL)))
> head(ls(rmap))
[1] "2'-PDE" "3.8-1"  "76P"    "AADAC"  "AAK1"   "AAMP"  
> rmap[["AADAC"]]
[1] "36512_at"
> thenames <- head(ls(rmap)) # the sybmols we're looking for?
> mget(thenames, rmap)
$`2'-PDE`
[1] "38144_at"

$`3.8-1`
[1] "34934_at"

$`76P`
[1] "40985_g_at" "40986_s_at" "40984_at"  

$AADAC
[1] "36512_at"

$AAK1
[1] "34949_at" "40628_at" "39456_at" "40572_at" "39463_at"

$AAMP
[1] "38434_at"
> idx <- unique(unlist(mget(thenames, rmap), use.names=FALSE))
> fibroEset[idx,]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12 features, 46 samples 
  element names: exprs 
phenoData
  sampleNames: 1, 2, ..., 46  (46 total)
  varLabels and varMetadata description:
    samp: sample code
    species: h: human, b: bonobo, g: gorilla
featureData
  featureNames: 38144_at, 34934_at, ..., 38434_at  (12 total)
  fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
  pubMedIds: 12840040 
Annotation: hgu95av2 

This will be a bit simpler in the forthcoming release, where the
AnnotationDbi package provides 'revmap'.

Martin

> Best,
>
> Jim
>
>
>> 
>> Thanks,
>> 
>> D
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the Bioconductor mailing list