[BioC] normalisation assumptions (violation of)

Tue Aug 8 23:06:21 CEST 2006

Quoting Henrik Bengtsson <hb at maths.lth.se>:

>
> In the bigger picture, given that you can identify those 20-30% DEs,
> how are you going interpret such a large list of genes?
>
> /H

The number of "useful" genes is quite smaller. This is because my 
experiment consists of 4 separate sub-experiments, all using a common 
reference (untransfected cells, in this case). Three of the 
subexperiments consist on teh hybridisation of transfected cells vs. 
untransfected. The transfection is of a construct expressing a fusion 
protein, teh first part contains a DNA-binding domain with certain 
sequence specificity (that we expect to occur in many promoters), the 
second is a strong transactivator. I'm hoping to detect teh binding of 
these protein domains by looking at what genes are upregulated, 
especially those that are only expressed after transfection. There are 
three subexperiments because they are slightly different proteins. The 
fourth experiment is a control, one of the previous fusion proteins 
with a couple of point mutations that we know to abolish strong 
specific DNA binding. Transfection of this construct still results in 
upregulation of many genes. What i do is analyse all data together 
(same common reference), and remove the DE genes (using an FDR of 0.05% 
or 0.01% as cut off) of the control experiment from the other three. 
Thsi reduces substantially the number of genes. From the remainder, 
then I focus on those that have negligible expression on teh 
untransfected cells, and decent expression afterwards. I then contrast 
this to what happened on teh control experiment (despite not being 
picked as DE in it). At the end I have tens of candidates. Less than 
100. It's not a crazy number and then proceed to verification by RT 
etc, and the biology starts.

When we started the experiment we were not sure what we would get. IN 
theory we could get thousands of genes. It all depends on how good our 
control is. that's why I used a simple common reference design, as it 
allows us to add easily another control if we find a better one.

I already analysed a set of data on a cell line, with RNA prepared by 
somebody else. It worked pretty well, but the effect wasn't as great as 
I am seeing here. The transfection efficiency may have something to do 
with it. I checked all my transfections by Western blot and only used 
the ones that gave me strong expression of teh fusion protein, I 
suspect the other person wasn't so picky.

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK