[BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization
Stefanie Busch
stefanie.busch2 at web.de
Wed May 7 13:21:58 CEST 2014
Dear Gordon,
Thank you for your answer. I have still a few questions.
1. >Two colour arrays don't return expression values. Instead they return
log-ratios, which are stored in M. When you compare Agilent to Affymetrix
Chips and Illumina Beadarrays, you need to compare log-fold-changes and DE
results, not expression values.
What does DE results mean? And what should I do with the affymetrix Chips or
Illumina Beadarray? I preprocess the affymetrix chips with rma, which
already makes a log transformation? The illumina array was background
corrected, than log transformed and at last quantile normalized with the
package: lumi.
2. > For comparing the results of the different studies I want to use the >
package: RankProd. As far as I know, RankProd assesses differential
expression and does not in itself help you compare one study to another. The
usual methods to compare one study to another are (i) to make a scatterplot
of logFC from the two experiments or (ii) to use a gene set test such as
roast() in the limma package. The limma package can compute logFC for
whatever comparison you are making.
I don't want to compare the studies, directly. I want to take the results of
all experiments and get a list of genes which would be up- or downregulated
over all studies. I think RankProd was a good choice. For this I make a big
excel table which look like this. I have seven different experiments, so it
is possible that Bglap is not investigated on each chip. RankProd will
ignore the missing values.
Experment1
Experiment2
con1 con2 con3 Diet1 diet2 diet3 con1 con2
con3 con4 con4 diet1 diet2 diet3 diet4 diet5
Bglap 2,8 2,4 2,7 3,3 3,66 3,1 5,1
6,6 6,2 6,6 6,3 5,9 6,5 6,4 5,7 6,9
Copd 5,4 7,2 5,8 4,3 5 4,9 3
2,7 4 3,5 4,2 4,3 3,5 3,9 2,5 3,1
Sirt1 7 6,5 7,2 7,3 7,1 6,7 4,5
3,7 4,2 4,6 4,1 4,2 4,5 4,8 4,5 3,9
...
cl<- 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2
origin<- 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 --> this means two different
experiments
My aim is to have a list of up- and downregulated genes for intervention a
(7 experiments, intervention a vs. control) and a list of up- and
downregulated genes for intervention b (3 experiments, intervention b vs.
control) to see if there are genes which are up- or downregulated by both
interventions.
3. > For this purpose, I always recommend that, for each Entrez ID, you use
the probe on each platform with the highest overall expression level.
Example:
Example
control 1 Control 2 control 3 diet1 diet2 diet3
(this are replicates for the same group)
Bglap 2,5 3,2 3,1 3,9
4,8 3,1
Bglap 1 0,7 0,9 1,2
0,7 1
Bglap 4,9 3,3 4,1 4,8
5,5 5,2
So I will only take the last row? Is there a R command to filter for this
rows in Affy or Illumina?
4. > The rationale of this is that you are using the probe that represents
the dominant transcript for that gene in the cell type. This method has been
used for many published studies by now, the first of which may have been:
http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed
like this for the Agilent data, assuming you have put the EntrezIDs into the
object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A)
o <- order(A,decreasing=TRUE) MA2 <- MA[o,] d <-
duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object
with a unique probe for each EntrezID.
This command doesn't work with my example
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23523.WhennIfinished
all steps there won't be any value in my list. I think the problem could be
that MA$EntrezID is missed.
Kind regards
Stefanie
Gesendet: Sonntag, 04. Mai 2014 um 06:50 Uhr
Von: "Gordon K Smyth" <smyth at wehi.EDU.AU>
An: "Stefanie Busch" <stefanie.busch2 at web.de>
Cc: "Bioconductor mailing list" <bioconductor at r-project.org>
Betreff: 1. comparing chip Information in meta analysis / Rankprod and 2.
two color normalization
Dear Stephanie, > Date: 30 April 2014 > From: Pekka Kohonen > From: Stefanie
Busch > To: Bioconductor > Subject: Re: [BioC] 1. comparing chip Information
in meta analysis / Rankprod and 2. two color normalization > > Hello, > > I
have two questions and I hope you can help me. > > I want to compare several
studies with similar design but different > arrays. The first step was to
quantile normalize all data which works > well beside the two color
experiment with an Agilent chip. As you seem to have realized already,
quantile normalization is not usually appropriate for a two colour Agilent
array. Loess normalization is generally for two colour arrays, and I
recommend a normexp background correction step before that. > I read the
limma User Guide and find out that I must preprocess with the > function
normalizeBetweenArrays. So I get M- and A-values and my > question is which
one shows the expression values for this experiment? Two colour arrays don't
return expression values. Instead they return log-ratios, which are stored
in M. When you compare Agilent to Affymetrix Chips and Illumina Beadarrays,
you need to compare log-fold-changes and DE results, not expression values.
> For comparing the results of the different studies I want to use the >
package: RankProd. As far as I know, RankProd assesses differential
expression and does not in itself help you compare one study to another. The
usual methods to compare one study to another are (i) to make a scatterplot
of logFC from the two experiments or (ii) to use a gene set test such as
roast() in the limma package. The limma package can compute logFC for
whatever comparison you are making. > For a better comparision between the
studies I used > the Entrez IDs and I download the last chip information
directly from > affymerix and illumina. So this reveal a new problem. For
example on > the chip Affymetrix Mouse Genome 430 2.0 Array the ID
1449880_s_at > stands for three gene names and entrez ids:Bglap /// Bglap2
/// Bglap3 - > 12095 /// 12096 /// 12097. On the Illumina Chip each gene has
a single > Array ID: > Bglap-rs1 - ILMN_1233122 - 12095 > Bglap1 -
ILMN_2610166 - 12096 > Bglap2 - ILMN_2944508 - 12097 > > So I don't no what
I should do to compare the results of this two > experiments. When I paste
the expression values of 1449880_s_at three > times with the three different
entrez-IDs the ranking which was > calculating with the RankProd-Package was
changed. > Example: > Chip ID Entrez-Id Control1 control 2 etc. >
1449880_s_at - 12095 - 3,855 - 4,211 ... > 1449880_s_at - 12096 - 3,855 -
4,211 ... > 1449880_s_at - 12097 - 3,855 - 4,211 ... > > The other
possibility is to take the three expression Values of the > illumina chip to
one value. But I don't know if the is the right way. > What is the better
way? For this purpose, I always recommend that, for each Entrez ID, you use
the probe on each platform with the highest overall expression level. The
rationale of this is that you are using the probe that represents the
dominant transcript for that gene in the cell type. This method has been
used for many published studies by now, the first of which may have been:
[1]http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed
like this for the Agilent data, assuming you have put the EntrezIDs into the
object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A)
o <- order(A,decreasing=TRUE) MA2 <- MA[o,] d <-
duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object
with a unique probe for each EntrezID. Simply averaging the probes or
probe-sets is not generally recommended, because different probes for the
same gene can have quite different behaviour. A common situation is that one
probe successfully probes an expressed transcript while another probe is
essentially unexpressed. Best wishes Gordon > Kind regards > Stefanie Busch
______________________________________________________________________The
information in this email is confidential and intended solely for the
addressee. You must not disclose, forward, print or use it without the
permission of the sender.
______________________________________________________________________
References
1. http://www.biomedcentral.com/1471-2105/7/511
More information about the Bioconductor
mailing list