[BioC] Limma: Nimblegen array data import?

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 29 18:33:17 CET 2007


Mark, Dave -- It's true that lmFit works with a basic matrix, but it's
not too hard to create an ExpressionSet

1. Read in the expression data (you'll have to do this anyway):

> dataDirectory <- system.file("extdata", package="Biobase")
> exprsFile <- file.path(dataDirectory, "exprsData.txt")
> exprs <- as.matrix(read.table(exprsFile, header=TRUE, sep="\t",
+                               row.names=1,
+                               as.is=TRUE))

then create an ExpressionSet

> library(Biobase)
> mySet1 <- new("ExpressionSet", exprs=exprs)

That's it.

Why bother? Because you can now start to let the software do your work
for you, reducing errors and improving reproducibility. For instance,
almost all experiments have data that describe the phenotypes
('phenotypes' broadly defined to mean characteristics, phenotypic,
genetic, or otherwise) of the samples. Here's some phenotypic data
that we can use to capture the sample description as an
AnnotatedDataFrame

2. Read in phenotypic data

> pDataFile <- file.path(dataDirectory, "pData.txt")
> pData <- read.table(pDataFile,
+                     row.names=1, header=TRUE, sep="\t")

3. Create an annotated data frame

> phenoData <- new("AnnotatedDataFrame", data=pData)

We can then add that to the existing ExpressionSet

> phenoData(mySet1) <- phenoData

or, if we've thought ahead, create an ExpressionSet directly

> mySet2 <- new("ExpressionSet", exprs=exprs, phenoData=phenoData)

Why is this helpful? It coordinates the sample and phenotype
information, so e.g., if we subset the samples, we also subset the
relevant phenoData

> dim(mySet2)
Features  Samples 
     500       26 
> dim(mySet2[,mySet2$gender=="Male"])
Features  Samples 
     500       15 

It also helps us to avoid, e.g., mismatches between sample and
phenotype data:

> badData <- pData[sample(rownames(pData), nrow(pData)),]
> badPhenoData <- new("AnnotatedDataFrame", data=badData)
> mySet3 <- new("ExpressionSet", exprs=exprs, phenoData=badPhenoData)
Error in validObject(.Object) : 
  invalid class "ExpressionSet" object: sampleNames differ between assayData and phenoData

Coordinating expression and phenotype data, and avoiding subtle
errors, seem like good reasons to start down the ExpressionSet road;
there's a more comprehensive introduction in the Biobase vignette 'An
introduction to Biobase and ExpressionSets'

> openVignette()
Please select a vignette:  

1: Biobase - An introduction to Biobase and ExpressionSets
2: Biobase - Bioconductor Overview
3: Biobase - esApply Introduction
4: Biobase - Notes for eSet developers
5: Biobase - Notes for writing introductory 'how to' documents
6: Biobase - quick views of eSet instances
7: limma - Limma Vignette

Selection: 1

Martin

Mark Robinson <mrobinson at wehi.EDU.AU> writes:

> How about just running 'limma' on the table of normalized expression  
> values?
>
> In ?lmFit, the object which gets operated on doesn't have to be an  
> exprSet.
>
> M.
>
> On 29/11/2007, at 2:52 PM, Dave Berger wrote:
>
>> previously, I have used limma for 2 colour array analysis.
>> I now have a new data set from Nimblegen arrays in which RMA
>> normalization has been completed and I wish to identify differentially
>> expressed genes from the allcalls.txt file which is a table of
>> expression values for all the treatments in one file
>> Question:
>> if I wish to do a single channel analysis in "limma", I would
>> appreciate suggestions on importing this data ie. how do I convert the
>> data to an "exprSet" object?
>>
>> thanks
>> Dave Berger
>>
>>
>> This message and attachments are subject to a disclaimer. Please refer
>> to www.it.up.ac.za/documentation/governance/disclaimer/ for full
>> details. / Hierdie boodskap en aanhangsels is aan 'n  
>> vrywaringsklousule
>> onderhewig. Volledige besonderhede is by
>> www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/ 
>> gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Dr. Martin Morgan, PhD
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list