[BioC] custom Affymetrix

Wolfgang Huber whuber at embl.de
Wed Apr 28 10:18:14 CEST 2010


Dear Oana

the basic container in Bioconductor for expression data (where you would 
also put your custom data) is the ExpressionSet. This is essentially a 
matrix of expression values, plus a table with annotations for the rows
(features) -- this can e.g. be target names -- and a table with 
annotations for the columns (samples) -- this can e.g. be patient IDs.
The documentation mentioned below explains how to do data analysis steps 
such as differential expressed genes or clustering on data in an 
ExpressionSet.

AffyBatch is a container for raw Affymetrix data. It happens to 
internally share the same structure as an ExpressionSet, but except for 
some special cases (e.g. technical quality assessement), you don't do 
high-level data analysis on that. The function "rma" in the affy package 
is the most popular way of turning an AffyBatch into an ExpressionSet.

To load a dataset from GEO, you can use
   library("GEOquery")
   x = getGEO("GSE5479")

This gave me a list of 4 ExpressionSet objects, each with 3072 rows and 
255 columns. I didn't find it immediately obvious how to interpret those 
with the experiment description on 
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479 which mentions 
that 404 samples were measured in duplicate.

Ideally, such a call to getGEO (or, similarly to the "ArrayExpress" 
function in the equinymous package will directly produce an 
ExpressionSet with you can continue your analysis. In reality, however, 
often some furter data reshuffling and/or normalisation is necessary, 
and the details depend on the dataset. The curators of GEO, or the 
submitters of the data, are the best people to ask for clarification here.

Perhaps downloading the data from 
http://www.mdl.dk/Publications_sup7.htm and parsing it yourself into an 
ExpressionSet is an another approach - I haven't tried that. The 
vignette "Biobase - An introduction to Biobase and ExpressionSets" 
explains how to "Build an ExpressionSet From Scratch", given the various 
components.

Hope this helps.

	Best wishes
	Wolfgang






Vermesan Oana ha scritto:
> I want to thank you for the prompt answer. 
> 
> I know that the data from that URL has been preprocessed, it was just an example of custom Affymetrix. I found other
> custom Affymetrix datasets that have raw data, that I need to preprocess (such as http://www.mdl.dk/Publications_sup7.htm). 
> I also have the book that you mentioned, and the examples are on Affymetrix data and I can't use them on my dataset. I've tried 
> but I get the same error - the object that I use must be an Affybatch.
> I believe that are some special packages that can be used on custom data and I want to know which one.
> 
> 
> Thank you again and I look forward to your answer,
> Oana
> 
> 
> 
> 
> ________________________________
> From: Wolfgang Huber <whuber at embl.de>
> To: Vermesan Oana <oana.vermesan at yahoo.com>
> Cc: bioconductor at stat.math.ethz.ch
> Sent: Tue, April 27, 2010 2:46:12 PM
> Subject: Re: [BioC] custom Affymetrix
> 
> 
> Dear Oana,
> 
> I am sure you have read the experiment description at the URL you kindly sent us, where it says "The final average log2 values are supplied as supplemental files to this GEO series." This indicates that the data producers consider no further background correction or normalize steps to be necessary.
> 
> For differential expresion, you might have a look, for instance, at the respective chapters in the book  http://www.bioconductor.org/pub/biocases
> which discusses the other questions (differential analysis and clustering) you asked.
> 
>     Best wishes
>     Wolfgang
> 
> 
> Vermesan Oana scripsit 27/04/10 08:05:
>> Good morning,
>>
>> I started using Bioconductor about a month ago. I'm working with Affymetrix and Agilent data. I have the entire workflow
>> and I know how to use the packages for this type of data. Recently I have found some custom Affymetrix data and I don't
>> know how to process it. I have only an .xls file with the measured genes and a .txt with the clinical information. These two
>> can be found here http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5479. Can someone tell me which packages
>> to use in order to do the background correction, to normalize the data and to do a differential analysis? I also need to do
>> a hierarchical clustering. If someone could help me I would really appreciate that. 
>> Thank you,
>> Oana
>>
>>
>>
>>           [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list