[BioC] question about get a summary gene expression information from the probe set associated with one specific gene
James W. MacDonald
jmacdon at med.umich.edu
Mon Jan 3 19:00:40 CET 2011
Hi Xiaowei,
On 12/30/2010 12:55 PM, Xiaowei Guan wrote:
> Dear Bioconudctor,
>
> I have this question about how to compress the gene expression dataset from
> probe sets denoted to gene denoted values.
>
> The analysis has two simultaneous goals, first is to convert the probe sets
> to gene names. Second is to convert the probe sets values into just one
> summary gene expression value of the associated gene.
>
> For example, we have 21 probes that corresponding to only 4 genes, Is there
> any package will fulfill the goal of deriving a summary information of a
> probe set corresponding to a specific gene?
You are not using very exacting language here. If I assume you are using
Affymetrix chips, then a probe is quite different from a probe set. If I
further assume that any time you say 'probe' you actually mean probe set
(e.g., 'we have 21 probe [sets] that correspond to only 4 genes'), then
there are a couple of ways you can go here. And each has its own
positive and negative aspects.
You could use one of the MBNI re-mapped CDF packages, which map the
probes to (genes, transcripts, etc, depending on the package), so each
probe set uniquely measures a single entity. The positive aspects of
these CDF packages is that you no longer have multiple probesets for
each gene. The negative aspect is that the number of probes per probe
set is highly variable, so the accuracy of the measurements will vary as
well (and this is usually not accounted for when doing downstream analyses).
Alternatively, you could choose just one probe set for each gene, based
on something like the most variability between your sample types, or the
largest difference. There is a function findLargest() in the genefilter
package that can help, and there may be others as well. The positives
for this approach are that you again only have one probe set per gene.
The negative is that you are making the (unfounded IMO) assumption that
you can determine which probe set is measuring a given gene on such a
simple criterion.
>
> Another question is: if there are no gene assignments for our data (only
> probes names here),is there any way to assign the genes to each probe
> simultaneous when getting the summary information of the probe set? When
> download the annotation file. I noticed there are some probes which have two
> or more gene names, and in such case, we want to have two same columns with
> different gene names.
If there are no gene assignments to a probe set, that is because it
likely doesn't measure any transcript. Most of the Affy chips were
designed years ago, and given the fluid nature of gene annotations it
isn't unusual for some of them to no longer be considered to measure a
known gene.
As for your second question, that's easy. Just make another row and tack
on the second gene name. It's probably not advisable, however. In these
cases you aren't sure what the probe set is measuring, so attributing
the data to two genes is a fairly risky proposition.
Best,
Jim
Thank you so much!
>
> Best,
> Xiaowei
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list