[BioC] where to start?

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed Apr 20 15:52:17 CEST 2005

OK, off the top of my head, *if* I wanted to use limma to apply some
linear models across my data set (which would, I think, tell me which
genes were changing significantly between one or more phenotypes):

1) Edit the data in Excel and get rid of the Class row
2) Save the data as text (tab or space delimited, it doesn't matter)
3) read the data in to R using read.table (which by default splits data
into columns based on white-space)
4) manipulate the data - what we are trying to get is a matrix of data
in R, where the rownames() of the matrix are the genenames, the
colnames() of the matrix are the experiment names and the values are the
expression values.  What you get back from read.table() will be like
this, but not quite.  Install the Biobase library, load it and type
?exprSet.  Read that entire help file, execute the example, and pay
particular attention to the format of the geneData matrix
5) after reading the above helpfile you will have an exprSet object,
which can be used by limma.  
6) read the limma manual.  Some of the bits of the manual which refer to
affy data include examples of using limma with exprSet objects.

:-)  Most of all have fun.  R can be a tough learning curve, but it's
worth it.  Let me know how you get on.


-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Malik
Sent: 20 April 2005 17:06
To: 'Sean Davis'
Cc: yousef at wistar.org; bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] where to start?

I have data that been preprocessed to have the gene expression for each
genes, where I have 19200 genes involved in the experiments and I have
186 samples. The samples define 32 phenotypes (classes). I would like to
find the significant genes among 10 different combinations of classes
and then find out the intersection between those lists of significant

My problem was is how to read this simple data to any package of
bioconductor, since I saw that bioconductor input format is more
requiring the image format (or I'm missing some thing here). I want to
read the input file where I want to keep track of the gene Id and the
gene name. So please only provide me with simple example reading this
input format to any basic package of bioconductor. For simplicit
consider that we have a table as fellow:
GenId GeneName   Sample1   Sample2  Sample3  Sample4 Sample5
Class            C1         C1       C2       C3      C4            C1
1       gene1    0.04       0.05     0.06     0.7     0.8  .......   0.9

Where the second row have the class labels, and then at the third row we
have the gene expressions (just numbers!!).
So I want to read this format to a specific bioconductor package (say
limma/?) and start applying diffirent functions.

So again I want to know how to read this file to the package???

From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
Sent: Wednesday, April 20, 2005 2:56 AM
To: yousef at wistar.org
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] where to start?

On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote:

> Hello,
> I have a gene expression data set build up form rows of genes
> expression as
> fellow:
> GeneID  GeneName      Sample1    .......... Samplen
>  Category                      +1      ...........-1
>  1             gene1            0.5 ..............0.67
>  2             gene2            0.34 ............. 0.78
> How I could use bioconductor to analyze this data set and get the most

> informative genes, classification.. Clustering and etc


You will have to decide what specific questions you want to answer  
using your data.  To get a sense of what bioconductor has to offer, try

looking here:


The vignettes give a lot of detail about how to use different packages.

  The BioConductor Short Courses are very helpful as a starting place.

When you run into specific problems, ask here.  If you want more help  
here, you will probably have to be more specific about your data, what  
you have tried, and what hasn't worked.  Single channel or two-color?   
Patient samples or cell lines or something else?  Expression or CGH?   
How many classes of sample?  What are the research  



Bioconductor mailing list
Bioconductor at stat.math.ethz.ch

More information about the Bioconductor mailing list