I would greatly appreciate any help with the following.

We have recently conducted a microarray experiment to investigate the 
effect of amplification on expression ratios. The design of the experiment
is as follows:

group1: DRG amplified RNA (4 Affymetric replicates)
group2: spinal cord amplified RNA (4 Affymetrix replicates)
group3: DRG unamplified RNA (3 Affymetrix replicates)
group4: spinal cord unamplified RNA (3 Affymetrix replicates)

I fed all data through to limma and specified the following contrasts:
contrast1: group1-group2
contrast2: group3-group4
contrast3: (group1-group2) v (group3-group4)

I began by examining the level of concordance between contrast 1 and 2 so 
I cross compared the top 100 (....going up to top 1000)genes from contrast 
1 and 2 looking for the extent of agreement between them. The results are 
quite nice showing in average more than 80% agreement.

However, if I only feed group1 and group2 to limma without the rest of the 
data and repeat contrast1. And separate to that, feed group 3 and group 4 
to limma and repeat contrast2 the percentages are very different and vary 
with the number of genes selected so as an example

no top genes       %agreement between contrast1 and 2
100                     27
300                     40
500                     50
2000                    60

I have a feeling that the former is more correct statistically speaking 
but I can't say for sure what is correct and justify the difference 
between the two observations. 

I tried to read more about the statistical theory of limma but it seems 
well above my modest understanding of statistics. Can somebody help me 
resolve this confusion. In particular, I want to know which of the two 
approaches is more sensible and why is causing the difference between 

NS: all chips are Affymetrix
Many thanks 

