[BioC] Design matrix for time course analysis with maSigPro

Matthias Boeck boeckm at in.tum.de
Thu May 27 12:04:48 CEST 2010


Hello,

I'm working on the analysis of time series data (MAS5) which consists of
two experiments (expA, expB) on two cell lines (clA, clB) (but a similar
if not same behavior is expected and therefore they might be used as
replicates). Each experiment consists of four measurements at different
points in time (6h, 24h, 72h and 144h) and for each of this measurements
a control exits too. The controls are cell cultures of the same cell
line which are untreated but still can show some activity.

At the moment I try to find the differences and especially similarities
between the two experiments in their reaction on the treatments and
wanted to use maSigPro (if you have another suggestion I would be glad
for any further advice).  
I already did some calculations with the package but I'm not sure if I
got the design of the design matrix right and maybe you could be so kind
to take a look at my matrix. Replicates are within the experiments and I
used four dummy variables for the different experiments and cell lines:


                       Time Replicate Control expB_clA expB_clB expA_clA
expA_clB
clA_6hr_expA_ctr          6         1       1        0        0        0
0
clA_6hr_expA              6         2       0        0        0        1
0
clA_24hr_expA_ctr        24         3       1        0        0        0
0
clA_24hr_expA            24         4       0        0        0        1
0
clA_day3_expA_ctr        72         5       1        0        0        0
0
clA_day3_expA            72         6       0        0        0        1
0
clA_day6_expA_ctrl      144         7       1        0        0        0
0
clA_day6_expA           144         8       0        0        0        1
0
clB_6hr_expA_ctr          6         1       1        0        0        0
0
clB_6hr_expA              6         2       0        0        0        0
1
clB_24hr_expA_ctr        24         3       1        0        0        0
0
clB_24hr_expA            24         4       0        0        0        0
1
clB_day3_expA_ctr        72         5       1        0        0        0
0
clB_day3_expA            72         6       0        0        0        0
1
clB_day6_expA_ctr       144         7       1        0        0        0
0
clB_day6_expA           144         8       0        0        0        0
1
clA_6hr_expB_ctr          6         9       1        0        0        0
0
clA_6hr_expB              6        10       0        1        0        0
0
clA_24hr_expB_ctr        24        11       1        0        0        0
0
clA_24hr_expB            24        12       0        1        0        0
0
clA_day3_expB_ctr        72        13       1        0        0        0
0
clA_day3_expB            72        14       0        1        0        0
0
clA_day6_expB_ctr       144        15       1        0        0        0
0
clA_day6_expB           144        16       0        1        0        0
0
clB_6hr_expB_ctr          6         9       1        0        0        0
0
clB_6hr_expB              6        10       0        0        1        0
0
clB_24hr_expB_ctr        24        11       1        0        0        0
0
clB_24hr_expB            24        12       0        0        1        0
0
clB_day3_expB_ctr        72        13       1        0        0        0
0
clB_day3_expB            72        14       0        0        1        0
0
clB_day6_expB_ctr       144        15       1        0        0        0
0
clB_day6_expB           144        16       0        0        1        0
0


By using this design I end up with about 1179 probes after the first
regression step (p.vector() with q-value of 0.0001). I'm not sure if
this is a realistic amount or if it is because of the design or the lack
of further replicates (array quality checks have already been performed
on the data). Would a non specific filtering make sense before the
analysis?
I also considered changing the replicates column and grouped the
controls according to the cell lines but this didn't seem to alter the
results. Does the algorithm take the mean/median over all given controls
without considering the replicate grouping? Or could this be a hint that
the controls are quite similar and could also be combined? If the
controls are grouped together in the replicates, is maSigPro taking the
median over those for the calculation or is this just for the
see.genes() visualization? 


I'm sorry for all these questions but I haven't worked before with the
time series packages in R and I'm not sure if I use the methods
correctly.
I would be glad for any help!


Best wishes,
Matthias



More information about the Bioconductor mailing list