[BioC] Fw: duplicate genes in Affy arrays

Suresh Gopalan gopalans at comcast.net
Fri Aug 19 17:00:48 CEST 2005


 Hi Adrien

 "biologically": has to do with the work flow from collecting the eukaryotic
 sample to the point where you have intensity data, in these kind of
 experiments.

 "pitfall" (one of them): try taking a look few probesets of this nature
 (multiple for each transcript) in some experiments and your favorite
 expression summary measure and see what you get.  It also depends on which
 stage of analysis you are looking to deal with this issue.

 Suresh

 Suresh Gopalan, Ph.D.


> ----- Original Message ----- 
> From: "Jamain, Adrien J" <adrien.jamain at imperial.ac.uk>
> To: "Suresh Gopalan" <gopalans at comcast.net>
> Sent: Friday, August 19, 2005 5:25 AM
> Subject: RE: [BioC] duplicate genes in Affy arrays
>
>
>
> Dear Suresh,
>
> Sorry stupid question here. Why is it "biologically sound" to use the 3'
> most probeset? Is it because this way you are certain of having all the
> "unfinished" transcripts? What are the pitfalls of this approach (apart
> from discarding a lot of information)?
>
> I had a brief look at your article and couldn't find an explanation.
>
> Thanks,
> Adrien
>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
>> Suresh Gopalan
>> Sent: 19 August 2005 03:23
>> To: jsv at stat.ohio-state.edu; bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] duplicate genes in Affy arrays
>>
>> Hi
>>
>> I don't know if there is a consensus on this issue yet.  When
>> I did dealt with this to do some categorial over
>> representation analysis, I used the 3'
>> most probeset (www.pnas.org/cgi/doi/10.1073/pnas.0501211102).
>>  There are pitfalls to this approach also, though
>> biologically sound. The other approach I have seen
>> implemented in one software is to use the probeset with
>> highest expression.
>>
>> As to the last question, it depends.  Based on published
>> articles using whole genome tiling arrays and listening to
>> the current interpretation, the answer could be tricky.
>>
>> Suresh
>>
>> Suresh Gopalan, Ph.D.
>>
>> ----- Original Message -----
>> From: <jsv at stat.ohio-state.edu>
>> To: <bioconductor at stat.math.ethz.ch>
>> Sent: Thursday, August 18, 2005 7:50 AM
>> Subject: [BioC] duplicate genes in Affy arrays
>>
>>
>> > Is there any general procedure for handling duplicate genes in Affy
>> > arrays?
>> >
>> > For example, for the hu6800 array which has 7129 probe sets,
>> > there are 869 genes that are represented by more than one probe set,
>> > with one gene (ACTB) being represented by 9 probe sets.
>> >
>> > g.symbols=aafSymbol(X.gnames,"hu6800")
>> > ug.symbols <- unlist(g.symbols)
>> > length(ug.symbols) #6980 (7129-6980 = 149 with no symbols)
>> > symbol.usage <- table(ug.symbols)
>> > sum(symbol.usage>1)  # 869
>> > max(symbol.usage)  #9
>> >
>> > Ignoring this would seem to invalidate a number of multiple
>> comparison
>> > procedures.  Is it reasonable to average probe set
>> expression levels for
>> > the same gene?  Are there any "pre-processing" routines
>> that address this
>> > issue?
>> >
>> > The flip side of this question is "Do probe sets with the
>> same gene symbol
>> > really specify the same gene? Does it matter which
>> annotational method is
>> > used to name genes?"
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>



More information about the Bioconductor mailing list