[BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization
Stefanie Busch
stefanie.busch2 at web.de
Wed May 7 11:47:13 CEST 2014
Hi Pekka,
I had read about the custom cfds now and it sounds very good. But there are
still a few questions. The first problem is to install the custom CDF
package. I download the package from BrainArray and want to install with the
command:
install.packages("C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1
.tar.gz", repos=NULL,type="source") and I get the following error message:
Installing package into ‘C:/Users/Julie/Documents/R/win-library/3.0’
(as ‘lib’ is unspecified)
* installing *source* package 'CustomCDF' ...
** libs
*** arch - i386
ERROR: compilation failed for package 'CustomCDF'
* removing 'C:/Users/Julie/Documents/R/win-library/3.0/CustomCDF'
Warnmeldungen:
1: Ausführung von Kommando '"C:/PROGRA~1/R/R-30~1.2/bin/x64/R" CMD INSTALL
-l "C:\Users\Julie\Documents\R\win-library\3.0"
"C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1.tar.gz"'ergab
Status 1 (the command has the status 1)
2: In
install.packages("C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1
.tar.gz", :
Installation des Pakets
‘C:/Users/Julie/documents/R/win-library/3.0/CustomCDF_1.2.1.tar.gz’hatte
Exit-Status ungleich 0 (The Installation of the pacakge has the exit-status
unequal 0)
So I tried to download the Chip information directly. I take the cdf file
version 18 for Affymetrix Mouse Genome 430 2.0 Array ([1]Mouse4302)
As I have mentioned I want to use the Entrez IDs so I take ENTREZG
(mouse4302mmentrezgcdf). The Installation of the package works very well but
I'm irritated when I see that there are only 17607 genes/ affyids
data<-ReadAffy(verbose=TRUE,filenames=cels,cdfname="mouse4302mmentrezgcdf")
> data
AffyBatch object
size of arrays=1002x1002 features (47 kb)
cdf=mouse4302mmentrezgcdf (17607 affyids)
number of samples=96
number of genes=17607
annotation=mouse4302mmentrezgcdf
notes=
When I take no cdf file I get more affyids
data2<-ReadAffy(verbose=TRUE,filenames=cels)
> data2
AffyBatch object
size of arrays=1002x1002 features (47 kb)
cdf=Mouse430_2 (45101 affyids)
number of samples=96
number of genes=45101
annotation=mouse4302
notes=
When I take the new cdf file, wasn't there a lost of information?
2. I have a question to the median. Median of what?
Until nowI have done this:
Example
control 1 Control 2 control 3 diet1 diet2 diet3
(this are replicates for the same group)
Bglap 2,5 3,2 3,1 3,9
4,8 3,1
Bglap 1 0,7 0,9 1,2
0,7 1
Bglap 4,9 3,3 4,1 4,8
5,5 5,2
mean value
Con1 Con2 Con3 diet1 diet2 diet3
Bglap 2,8 2,4 2,7 3,3 3,66 3,1
For this values I calculated the p-value with wilcoxon and than I want to
compare the results of different experiments with RankProd. So I put all
values in a big excel table and upload them to R. This table looks like
this:
Experment1
Experiment2
con1 con2 con3 Diet1 diet2 diet3 con1 con2
con3 con4 con4 diet1 diet2 diet3 diet4 diet5
Bglap 2,8 2,4 2,7 3,3 3,66 3,1 5,1
6,6 6,2 6,6 6,3 5,9 6,5 6,4 5,7 6,9
Copd 5,4 7,2 5,8 4,3 5 4,9 3
2,7 4 3,5 4,2 4,3 3,5 3,9 2,5 3,1
Sirt1 7 6,5 7,2 7,3 7,1 6,7 4,5
3,7 4,2 4,6 4,1 4,2 4,5 4,8 4,5 3,9
...
cl<- 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2
origin<- 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
Must I upload other values for the rank prod?
Kind regards
Stefanie
Gesendet: Freitag, 02. Mai 2014 um 14:26 Uhr
Von: "Pekka Kohonen" <pkpekka at gmail.com>
An: "Stefanie Busch" <stefanie.busch2 at web.de>
Cc: Bioconductor <bioconductor at r-project.org>
Betreff: Re: [BioC] 1. comparing chip Information in meta analysis /
Rankprod and 2. two color normalization
Hi Stefanie,
You could map the Affymetrix identifiers to single Entrez/Ensembl
identifier using the "custom cdfs" from "BrainArray". You can do the
normalization for instance using the "simpleaffy" package. If the
Agilent/illumina chip have duplicate probes for some genes you can
just take the median of the fold-change values and use those in the
RankProd package. It is best to have just one identifier/gene per
array, although having more than one is not strictly forbidden.
Custom CDF manuscript:
[2]http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200
another package to use might be this. But I have not used it myself.
RankAggreg:
[3]http://www.biomedcentral.com/1471-2105/10/62
Generally using rank-based analysis can lead to significant results
that have very small effect sizes (fold-change). So you should use
fold change to filter the results to some extent as well.
Best, Pekka
2014-04-30 11:36 GMT+02:00 Stefanie Busch <stefanie.busch2 at web.de>:
>
> Hello,
>
> I have two questions and I hope you can help me.
>
> I want to compare several studies with similar design but different
arrays.
> The first step was to quantile normalize all data which works well beside
> the two color experiment with an Agilent chip. I read the limma User Guide
> and find out that I must preprocess with the function
> normalizeBetweenArrays. So I get M- and A-values and my question is which
> one shows the expression values for this experiment?
>
> For comparing the results of the different studies I want to use the
> package: RankProd. For a better comparision between the studies I used the
> Entrez IDs and I download the last chip information directly from
affymerix
> and illumina. So this reveal a new problem. For example on the chip
> Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at stands for three
> gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - 12095 /// 12096
///
> 12097. On the Illumina Chip each gene has a single Array ID:
> Bglap-rs1 - ILMN_1233122 - 12095
> Bglap1 - ILMN_2610166 - 12096
> Bglap2 - ILMN_2944508 - 12097
>
> So I don't no what I should do to compare the results of this two
> experiments. When I paste the expression values of 1449880_s_at three
times
> with the three different entrez-IDs the ranking which was calculating with
> the RankProd-Package was changed.
> Example:
> Chip ID Entrez-Id Control1 control 2 etc.
> 1449880_s_at - 12095 - 3,855 - 4,211 ...
> 1449880_s_at - 12096 - 3,855 - 4,211 ...
> 1449880_s_at - 12097 - 3,855 - 4,211 ...
>
> The other possibility is to take the three expression Values of the
illumina
> chip to one value. But I don't know if the is the right way. What is the
> better way?
>
> Kind regards
> Stefanie Busch
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> [4]https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
[5]http://news.gmane.org/gmane.science.biology.informatics.conductor
References
1. http://www.affymetrix.com/support/technical/byproduct.affx?product=moe430-20
2. http://www.ncbi.nlm.nih.gov/pubmed/?term=16284200
3. http://www.biomedcentral.com/1471-2105/10/62
4. https://stat.ethz.ch/mailman/listinfo/bioconductor
5. http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list