[BioC] question about get a summary gene expression information from the probe set associated with one specific gene

Mon Jan 3 19:00:40 CET 2011

Hi Xiaowei,

On 12/30/2010 12:55 PM, Xiaowei Guan wrote:
> Dear Bioconudctor,
>
> I have this question about how to compress the gene expression dataset from
> probe sets denoted to gene denoted values.
>
> The analysis has two simultaneous goals, first is to convert the probe sets
> to gene names. Second is to convert the probe sets values into just one
> summary gene expression value of the associated gene.
>
>   For example, we have 21 probes that corresponding to only 4 genes, Is there
> any package will fulfill the goal of deriving a summary information of a
> probe set corresponding to a specific gene?

You are not using very exacting language here. If I assume you are using 
Affymetrix chips, then a probe is quite different from a probe set. If I 
further assume that any time you say 'probe' you actually mean probe set 
(e.g., 'we have 21 probe [sets] that correspond to only 4 genes'), then 
there are a couple of ways you can go here. And each has its own 
positive and negative aspects.

You could use one of the MBNI re-mapped CDF packages, which map the 
probes to (genes, transcripts, etc, depending on the package), so each 
probe set uniquely measures a single entity. The positive aspects of 
these CDF packages is that you no longer have multiple probesets for 
each gene. The negative aspect is that the number of probes per probe 
set is highly variable, so the accuracy of the measurements will vary as 
well (and this is usually not accounted for when doing downstream analyses).

Alternatively, you could choose just one probe set for each gene, based 
on something like the most variability between your sample types, or the 
largest difference. There is a function findLargest() in the genefilter 
package that can help, and there may be others as well. The positives 
for this approach are that you again only have one probe set per gene. 
The negative is that you are making the (unfounded IMO) assumption that 
you can determine which probe set is measuring a given gene on such a 
simple criterion.

>
> Another question is: if there are no gene assignments for our data (only
> probes names here),is there any way to assign the genes to each probe
> simultaneous when getting the summary information of the probe set? When
> download the annotation file. I noticed there are some probes which have two
> or more gene names, and in such case, we want to have two same columns with
> different gene names.

If there are no gene assignments to a probe set, that is because it 
likely doesn't measure any transcript. Most of the Affy chips were 
designed years ago, and given the fluid nature of gene annotations it 
isn't unusual for some of them to no longer be considered to measure a 
known gene.

As for your second question, that's easy. Just make another row and tack 
on the second gene name. It's probably not advisable, however. In these 
cases you aren't sure what the probe set is measuring, so attributing 
the data to two genes is a fairly risky proposition.

Best,

Jim

Thank you so much!
>
> Best,
> Xiaowei
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues