[BioC] design matrix

Wolfgang Huber huber at ebi.ac.uk
Wed May 23 15:39:48 CEST 2007


Dear Lev,

>   Thank you for the reply. I have checked it and it is something to do with vsn, it changes its output somehow when you change the sample order. Below is an example for the same data (6 samples, treatments 1 and 2) only in the ordered data i have them as 1,1,1,2,2,2 and in the unordered data i have them as 2,1,1,2,2,1.

There are two issues:

- After ML estimation of the normalization transformation, a constant
overall offset is added by vsn to the resulting data in order to ensure
that the result for high-intensity features is approximately the same as
the log-transformation (i.e. log(x)~glog(x) for large x). This should
not affect any differential expression analysis. In "vsn", this offset
is calculated for the first array (and applied to all). In "vsn2"
(versions 2.x of the package), the offset is called from an 'average
array' (and applied to all). This should reduce confusion about
order-dependence. So please consider updating your package and using vsn2.

Note that since we are on log-scale, adding an overall offset is
equivalent to a change of units, like measuring distances in km versus
in miles. As long as it's done consistently, it has no significance.

- More substantially, the result of vsn is a ML estimate obtained from a
numerical optimisation algorithm, and the results of this can vary
slightly depending on initial conditions and convergence thresholds.
However, the variability introduced by this should be negligible
compared to the biological or technical variability. This also seems to
apply to your example. However, please let me know if this is not the case.

Best wishes
 Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber




>   I would be really grateful for any comments.
>   Lev.
>    
>    
>   1. ordered data:
>   temp_ord<-vsn(signals_ord)
>   design_ord <- model.matrix(~0 +factor(c(1,1,1,2,2,2)))
>   colnames(design_ord) <- c("group1", "group2")
>   contrast.matrix <- makeContrasts(group2-group1, levels=design_ord)
>   exprs(temp_ord)<- log2(exp(exprs(temp_ord)))
>   fit_ord <- lmFit(temp_ord, design_ord)
>   fit2_ord <- contrasts.fit(fit_ord, contrast.matrix)
>   fit2_ord <- eBayes(fit2_ord)
>   topTable(fit2_ord, adjust="BH")
>    
>             ID     logFC  AveExpr         t      P.Value    adj.P.Val        B
>   168840 -5.302013 10.23113 -27.83760 3.589533e-09 2.716231e-05 11.23349
>   101140 -5.174833 10.22872 -27.68107 3.751573e-09 2.716231e-05 11.20243
>   123375 -4.265788 11.76289 -27.65844 3.775669e-09 2.716231e-05 11.19792
>   194271 -4.237924 14.03339 -27.06015 4.480631e-09 2.716231e-05 11.07621
>    
>   2. unordered data:
>   temp<-vsn(signals)
>   design <- model.matrix(~0 +factor(c(2,1,1,2,2,1)))
>   colnames(design) <- c("group1", "group2")
>   contrast.matrix <- makeContrasts(group2-group1, levels=design)
>   exprs(temp)<- log2(exp(exprs(temp)))
>   fit <- lmFit(temp, design)
>   fit2 <- contrasts.fit(fit, contrast.matrix)
>   fit2 <- eBayes(fit2)
>   topTable(fit2, adjust="BH")
>    
>             ID     logFC  AveExpr         t      P.Value    adj.P.Val        B
>   168840 -5.302214 12.51356 -27.83821 3.590400e-09 2.717735e-05 11.23314
>   101140 -5.175045 12.51114 -27.67796 3.756415e-09 2.717735e-05 11.20135
>   123375 -4.265825 14.04540 -27.65702 3.778740e-09 2.717735e-05 11.19717
>   194271 -4.237919 16.31592 -27.05930 4.483564e-09 2.717735e-05 11.07557
>    
>   > signals_ord[1:2, 1:6]
>   282357                        191.85                        542.75                        644.07                        544.29
>   224689                         66.95                        134.27                        127.79                        102.97
>      
>   282357                        119.78                        936.58
>   224689                         72.42                        357.10
>   > signals[1:2, 1:6]
>   282357                        936.58                        542.75                        644.07                        544.29
>   224689                        357.10                        134.27                        127.79                        102.97
>      
>   282357                        119.78                        191.85
>   224689                         72.42                         66.95
>   > exprs(temp_ord)[1:2,1:6]
>   282357                      8.011335                      8.393353                      8.384370                      8.176880
>   224689                      7.058134                      7.339350                      7.396429                      7.254554
>    
>   282357                      7.550265                      8.338583
>   224689                      7.228878                      7.677248
>   > exprs(temp)[1:2,1:6]
>   282357                     10.621015                     10.675780                     10.666816                     10.459286
>   224689                      9.959617                      9.621677                      9.678802                      9.536861
>    
>   282357                      9.832487                     10.293657
>   224689                      9.511012                      9.340247
>    
>   > sessionInfo()
>   R version 2.4.1 (2006-12-18) 
>   i386-pc-mingw32 
>    
>   locale:
>   LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>    
>   attached base packages:
>   [1] "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
>    
>   other attached packages:
>        vsn  Biobase    limma 
>   "1.12.0" "1.12.2"  "2.9.8"
> 
> 
>        
> ---------------------------------
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list