[BioC] where to start?

Malik Yousef yousef at wistar.org
Wed Apr 20 18:06:19 CEST 2005

I have data that been preprocessed to have the gene expression for each
genes, where I have 19200 genes involved in the experiments and I have 186
samples. The samples define 32 phenotypes (classes). I would like to find
the significant genes among 10 different combinations of classes and then
find out the intersection between those lists of significant genes.

My problem was is how to read this simple data to any package of
bioconductor, since I saw that bioconductor input format is more requiring
the image format (or I'm missing some thing here). I want to read the input
file where I want to keep track of the gene Id and the gene name.
So please only provide me with simple example reading this input format to
any basic package of bioconductor. For simplicit consider that we have a
table as fellow:
GenId GeneName   Sample1   Sample2  Sample3  Sample4 Sample5 ......SampleN
Class            C1         C1       C2       C3      C4            C1
1       gene1    0.04       0.05     0.06     0.7     0.8  .......   0.9

Where the second row have the class labels, and then at the third row we
have the gene expressions (just numbers!!).
So I want to read this format to a specific bioconductor package (say
limma/?) and start applying diffirent functions.

So again I want to know how to read this file to the package???

From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
Sent: Wednesday, April 20, 2005 2:56 AM
To: yousef at wistar.org
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] where to start?

On Apr 20, 2005, at 1:56 AM, Malik Yousef wrote:

> Hello,
> I have a gene expression data set build up form rows of genes  
> expression as
> fellow:
> GeneID  GeneName      Sample1    .......... Samplen
>  Category                      +1      ...........-1
>  1             gene1            0.5 ..............0.67
>  2             gene2            0.34 ............. 0.78
> How I could use bioconductor to analyze this data set and get the most
> informative genes, classification.. Clustering and etc


You will have to decide what specific questions you want to answer  
using your data.  To get a sense of what bioconductor has to offer, try  
looking here:


The vignettes give a lot of detail about how to use different packages.  
  The BioConductor Short Courses are very helpful as a starting place.   
When you run into specific problems, ask here.  If you want more help  
here, you will probably have to be more specific about your data, what  
you have tried, and what hasn't worked.  Single channel or two-color?   
Patient samples or cell lines or something else?  Expression or CGH?   
How many classes of sample?  What are the research  



More information about the Bioconductor mailing list