[BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization

Wed May 7 11:47:13 CEST 2014

   Hi Pekka,

   I had read about the custom cfds now and it sounds very good. But there are
   still  a few questions. The first problem is to install the custom CDF
   package. I download the package from BrainArray and want to install with the
   command:
   install.packages("C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1
   .tar.gz", repos=NULL,type="source") and I get the following error message:

   Installing package into ‘C:/Users/Julie/Documents/R/win-library/3.0’
   (as ‘lib’ is unspecified)
   * installing *source* package 'CustomCDF' ...
   ** libs
   *** arch - i386
   ERROR: compilation failed for package 'CustomCDF'
   * removing 'C:/Users/Julie/Documents/R/win-library/3.0/CustomCDF'
   Warnmeldungen:
   1: Ausführung von Kommando '"C:/PROGRA~1/R/R-30~1.2/bin/x64/R" CMD INSTALL
   -l                        "C:\Users\Julie\Documents\R\win-library\3.0"
   "C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1.tar.gz"'ergab
   Status 1 (the command has the status 1)
   2: In
   install.packages("C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1
   .tar.gz",  :
     Installation des Pakets
   ‘C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1.tar.gz’hatte
   Exit-Status ungleich 0 (The Installation of the pacakge has the exit-status
   unequal 0)

   So I tried to download the Chip information directly. I take the cdf file
   version 18 for Affymetrix Mouse Genome 430 2.0 Array ([1]Mouse4302)
   As  I  have  mentioned  I want to use the Entrez IDs so I take ENTREZG
   (mouse4302mmentrezgcdf). The Installation of the package works very well but
   I'm irritated when I see that there are only 17607 genes/ affyids

   data<-ReadAffy(verbose=TRUE,filenames=cels,cdfname="mouse4302mmentrezgcdf")
   > data
   AffyBatch object
   size of arrays=1002x1002 features (47 kb)
   cdf=mouse4302mmentrezgcdf (17607 affyids)
   number of samples=96
   number of genes=17607
   annotation=mouse4302mmentrezgcdf
   notes=

   When I take no cdf file I get more affyids

   data2<-ReadAffy(verbose=TRUE,filenames=cels)
   > data2
   AffyBatch object
   size of arrays=1002x1002 features (47 kb)
   cdf=Mouse430_2 (45101 affyids)
   number of samples=96
   number of genes=45101
   annotation=mouse4302
   notes=

   When I take the new cdf file, wasn't there a lost of information?

   2. I have a question to the median. Median of what?

   Until nowI have done this:
   Example
               control 1    Control 2     control 3    diet1    diet2    diet3
   (this are replicates for the same group)
   Bglap         2,5              3,2                 3,1             3,9
   4,8        3,1
   Bglap          1                 0,7                0,9            1,2
   0,7       1
   Bglap          4,9              3,3                 4,1            4,8
   5,5      5,2

   mean value
              Con1      Con2     Con3    diet1    diet2    diet3
   Bglap   2,8          2,4         2,7       3,3       3,66     3,1

   For this values I calculated the p-value with wilcoxon and than I want to
   compare the results of different experiments with RankProd. So I put all
   values in a big excel table and upload them to R. This table looks like
   this:
               Experment1
   Experiment2
                con1   con2   con3   Diet1   diet2   diet3   con1    con2
   con3   con4   con4   diet1  diet2   diet3   diet4   diet5
   Bglap     2,8     2,4        2,7        3,3      3,66      3,1     5,1
   6,6      6,2      6,6      6,3    5,9      6,5      6,4       5,7     6,9
   Copd       5,4     7,2       5,8        4,3       5          4,9     3
   2,7      4        3,5       4,2     4,3      3,5    3,9        2,5      3,1
   Sirt1        7         6,5      7,2       7,3    7,1      6,7      4,5
   3,7       4,2     4,6      4,1      4,2     4,5     4,8        4,5     3,9
   ...

   cl<- 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2
   origin<- 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

   Must I upload other values for the rank prod?

   Kind regards
   Stefanie

   Gesendet: Freitag, 02. Mai 2014 um 14:26 Uhr
   Von: "Pekka Kohonen" <pkpekka at gmail.com>
   An: "Stefanie Busch" <stefanie.busch2 at web.de>
   Cc: Bioconductor <bioconductor at r-project.org>
   Betreff: Re:  [BioC]  1. comparing chip Information in meta analysis /
   Rankprod and 2. two color normalization
   Hi Stefanie,
   You could map the Affymetrix identifiers to single Entrez/Ensembl
   identifier using the "custom cdfs" from "BrainArray". You can do the
   normalization for instance using the "simpleaffy" package. If the
   Agilent/illumina chip have duplicate probes for some genes you can
   just take the median of the fold-change values and use those in the
   RankProd package. It is best to have just one identifier/gene per
   array, although having more than one is not strictly forbidden.
   Custom CDF manuscript:
   [2]http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200
   another package to use might be this. But I have not used it myself.
   RankAggreg:
   [3]http://www.biomedcentral.com/1471-2105/10/62
   Generally using rank-based analysis can lead to significant results
   that have very small effect sizes (fold-change). So you should use
   fold change to filter the results to some extent as well.
   Best, Pekka
   2014-04-30 11:36 GMT+02:00 Stefanie Busch <stefanie.busch2 at web.de>:
   >
   > Hello,
   >
   > I have two questions and I hope you can help me.
   >
   >  I want to compare several studies with similar design but different
   arrays.
   > The first step was to quantile normalize all data which works well beside
   > the two color experiment with an Agilent chip. I read the limma User Guide
   > and find out that I must preprocess with the function
   > normalizeBetweenArrays. So I get M- and A-values and my question is which
   > one shows the expression values for this experiment?
   >
   > For comparing the results of the different studies I want to use the
   > package: RankProd. For a better comparision between the studies I used the
   >  Entrez  IDs  and I download the last chip information directly from
   affymerix
   > and illumina. So this reveal a new problem. For example on the chip
   > Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at stands for three
   > gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - 12095 /// 12096
   ///
   > 12097. On the Illumina Chip each gene has a single Array ID:
   > Bglap-rs1 - ILMN_1233122 - 12095
   > Bglap1 - ILMN_2610166 - 12096
   > Bglap2 - ILMN_2944508 - 12097
   >
   > So I don't no what I should do to compare the results of this two
   > experiments. When I paste the expression values of 1449880_s_at three
   times
   > with the three different entrez-IDs the ranking which was calculating with
   > the RankProd-Package was changed.
   > Example:
   > Chip ID Entrez-Id Control1 control 2 etc.
   > 1449880_s_at - 12095 - 3,855 - 4,211 ...
   > 1449880_s_at - 12096 - 3,855 - 4,211 ...
   > 1449880_s_at - 12097 - 3,855 - 4,211 ...
   >
   >  The other possibility is to take the three expression Values of the
   illumina
   > chip to one value. But I don't know if the is the right way. What is the
   > better way?
   >
   > Kind regards
   > Stefanie Busch
   > _______________________________________________
   > Bioconductor mailing list
   > Bioconductor at r-project.org
   > [4]https://stat.ethz.ch/mailman/listinfo/bioconductor
   > Search the archives:
   [5]http://news.gmane.org/gmane.science.biology.informatics.conductor

References

   1. http://www.affymetrix.com/support/technical/byproduct.affx?product=moe430-20
   2. http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200
   3. http://www.biomedcentral.com/1471-2105/10/62
   4. https://stat.ethz.ch/mailman/listinfo/bioconductor
   5. http://news.gmane.org/gmane.science.biology.informatics.conductor