[BioC] probe summarization
James W. MacDonald
jmacdon at med.umich.edu
Thu Sep 6 20:42:23 CEST 2007
Hi Bogdan,
Bogdan Tanasa wrote:
> Hi James,
>
> I used the following instructions in R (mydata <- ReadAffy(), mycomp <-
> gcrma (mydata), write.table (mycomp, "mytext.txt", sep="\t")
> or I called "mydata<-expresso(...,methods.summarization="median.polish',
> ....)". In the results table, I obtained an expression value
> per PROBE, and I would like to have an expression value per GENE. I know
> that RMA/GCRMA could use median polish to summarize
> the probes for a gene and to ask the question more specifically: is there
> anything that the code I use is missing ? In the final results
> table, I would like to have the expression values for 10000-12000 genes
> instead of having expression values for 22000 probes. Thanks,
There is a bit of terminology here that is incorrect. You have
expression values for 22283 _probesets_, which are based on ~250000 probes.
You are correct however that there is some duplication. How you deal
with that duplication is not a trivial question to answer. I suppose the
easist thing to do would be to use the MBNI re-mapped cdfs that we
supply. For instance, to use the Entrez Gene remapped cdf you would do
something like this:
dat <- ReadAffy(cdfname="hs133av2hsentrezgcdf")
biocLite("hs133av2hsentrezgprobe")
eset <- gcrma(dat)
As with all things, there are positive and negative aspects to using the
MBNI cdfs, the bad being the fact that the number of probes per probeset
are now highly variable, and one would usually then want to have
standard errors that could be propagated through to any differential
expression calculations. I think the puma package might be useful here,
but I haven't tried it yet.
You could also make the assumption that the probeset that has the
largest statistic in whatever comparison you are making is 'the right
one', and simply use that. The findLargest() function in genefilter is
useful in that respect.
Best,
Jim
>
> Bogdan
>
>
>
> # Read Affy CEL files
> data <- ReadAffy()\
> # Normalize and do summation using gcrma
> eset <- gcrms (data)
> #
> # Noe eset contains all the information that you require
> #
> # to get a data frame of expression values, use exprs command
> evals <- exprs (eset)
> #
> # The command below will tell you that it is a data frame
> class (evals)
> #
> # You can write out tab separated expression values to be used by other
> programs using the command
> write.table (evals, "expressvals.txt", sep="\t")
> #
> #
> Send me questions if you have any
>
> On 9/6/07, James W. MacDonald <jmacdon at med.umich.edu> wrote:
>
>>Hi Bogdan,
>>
>>Bogdan Tanasa wrote:
>>
>>>Hi all,
>>>
>>>I would like to ask for an information: I carry the array analysis for a
>>>large dataset (40 samples * 2 replicates);
>>>the arrays are Affy U133A, and I use GCRMA and invariant set
>>
>>normalization.
>>
>>>Please could you let me know
>>>the way I could do the probe summarization for these arrays. Thanks and
>>
>>best
>>
>>GCRMA _is_ a method to do probe summarization. Maybe you are asking a
>>different question?
>>
>>Best,
>>
>>Jim
>>
>>
>>
>>>regards,
>>>
>>>Bogdan
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
>>
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>--
>>James W. MacDonald
>>University of Michigan
>>Affymetrix and cDNA Microarray Core
>>1500 E Medical Center Drive
>>Ann Arbor MI 48109
>>734-647-5623
>>
>>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list