[BioC] a question about LIMMA
Francois Pepin
fpepin at cs.mcgill.ca
Thu Jun 12 18:24:38 CEST 2008
Hi Erika,
The comments that I made were about replicate probes with the same probe
ID, not about the cases where different probes point to the same gene.
In the first case, I would expect the same result and Gordon's new
function should deal with it very well.
In the second case, as we have here, I would expect large differences as
you are seeing. Part of this comes from the way Agilent does the probe
selection. They will only keep multiple probes for a gene if they see
some differences in their values. This could be due to alternative
splice sites, start and stop codons, etc.
I think that you are correct that the main explanation is how far a
probe is from the polyA tail.
In our main data set, we use laser-capture microdissection, which means
that we need to do 2 rounds of amplification to get enough material for
the hybridization. We probably see an even stronger 3' bias than you do.
Anything that gives you longer fragments would help in reducing this effect.
Some of the newer kits can help with this. I can give them to you
off-list if you want (this does not really concern many people here).
There are a few ways to deal with the problem. One of the simpler one is
to select a single probe to be representative of a gene, often by
selecting the more variable one. This is especially important when doing
analysis such as GO that expect a single measurement by gene. You could
also filter out the probes that you believe have no signal (by low
variance, IQR, etc.) and keep the rest.
We routinely blast the probe sequence before we go on with further
validation to make sure it is likely to give us a real signal, but as I
said, this affects us more than most.
Francois Pepin
Erika Melissari wrote:
> Dear Francois,
>
> I followed your suggestion about cheking the behavior of replicate
> probes, checking in an old experiment performed by using Agilent 44k rat
> whole genome arrays .
> I have found a "strange" result.
> I copy for you only one of these strange results.
>
> F635 Median - B635 F532 Median - B532 F635 Mean - B635 F532 Mean -
> B532 norm ratio
> Prkce NM_017171 A_42_P757370 3207 3264 3013 2967 1.210832876
> NM_017171 A_44_P481629 2566 2461 2605 2544 0.756496281
> NM_017171 A_44_P311955 404 457 427 454 0.002685538
>
>
> This concerns Prkce gene and you can find GenePix 6.0 background
> subtracted signal intensities and normalized ratio by LOESS.
> As you can see, the third probe have not actually the same signal as the
> others. I have checked the spot image, but there are not any problem
> and, sincerely, there are not any problem on all the array.
> At this point, I have thought to be a good idea to carry out an
> alignment of these probes and the Prkce rat transcriptome by using
> BLAST. I have found that the first two probes are placed near 3' end of
> this gene, whereas the third is far away this end.
> To be more precise these are the results:
>
> Prkce RNA length 1-2701 (5'->3')
> A_42_P757370 2305-2364
> A_44_P481629 2400-2459
> A_44_P311955 425-484
>
> The third probe is very close to 5' end!
> I have found a similar situation for other replicate probes.
> Then, I am thinking there is a problem in cDNA synthesis, that is
> perhaps the retrotranscription enzyme is not able to copy all the
> transcript and it is for this reason that the third probe have a signal
> so much low and different from the other two probes.
> In your opinion is this a good explanation?
> For this old experiment we did not use all the protocol by Agilent.
> Particularly, we did not use Quick Amp Labeling Protocol of Agilent, but
> we preferred Amino Allyl MessageAmp aRNA Amplification Kit by Ambion.
> Do you use all the Agilent system, included the Agilent Kit?
> Have you noticed any problem similar to that showed by our data?
>
> Thank you for your attention and for your kind help.
>
> Best Regards,
>
> Erika
>
> ----- Original Message ----- From: "Francois Pepin" <fpepin at cs.mcgill.ca>
> To: "Erika Melissari" <erika.melissari at bioclinica.unipi.it>
> Cc: <bioconductor at stat.math.ethz.ch>
> Sent: Thursday, June 05, 2008 18:30 PM
> Subject: Re: [BioC] a question about LIMMA
>
>
>> Dear Erika,
>>
>> please include the bioconductor list in your replies. That way other
>> people can chime in and people with the same question in the future can
>> find the posts in the archives.
>>
>> You might want to try the arrayQualityMetrics package for your QC also.
>>
>> It depends if you mean by "handle". Differential expression is only one
>> of the operation that is generally done with microarrays, after all. I
>> generally use limma for differential expression, but other packages are
>> available.
>>
>> I do not do any kind of sorting with the RG object. As I said, I check
>> the duplicate probes to make sure they're the same, but I otherwise
>> ignore them. Gordon's new function is probably what I would use now to
>> deal with them.
>>
>> You might want to read up on what the Loess normalization does. This
>> kind of repositioning would have no effect at all, as the neighborhood
>> is defined by the relative intensities of the spots.
>> ?normalizeWithinArrays suggests papers that describes those methods in
>> more details. In general, you never want to try to "help" those methods
>> along unless you really understand what they do. You risk invalidating
>> your results if you do so.
>>
>> Francois
>>
>> Erika Melissari wrote:
>>> Dear Francois,
>>>
>>> thank you very much for your help.
>>> About arrays, I mean 4x44k Agilent arrays, but we have already used 44k
>>> whole rat Agilent arrays.
>>> Agilent's Feature Extraction software performs a quality control
>>> procedure based on replicate spots to produce a measure of
>>> reproducibility (%CV) on the array...but It is not free of charge.
>>> Please, I have another question.
>>> What package do you use to handle microarray data?
>>> If you use LIMMA package, do you sort the RG file to put replicate
>>> probes close and then you normalize?
>>> When LOESS normalization method is used, maybe the M value depends on
>>> "neighbors" in the smoothing window. Then putting the replicate probes
>>> close can ensure about a normalization "bad" effect.
>>>
>>> Thank you
>>>
>>> Erika
>>>
>>>
>>> ----- Original Message ----- From: "Francois Pepin"
>>> <fpepin at cs.mcgill.ca>
>>> To: "Gordon K Smyth" <smyth at wehi.EDU.AU>
>>> Cc: "Erika Melissari" <erika.melissari at bioclinica.unipi.it>;
>>> <bioconductor at stat.math.ethz.ch>
>>> Sent: Wednesday, June 04, 2008 17:39 PM
>>> Subject: Re: [BioC] a question about LIMMA
>>>
>>>
>>>> Dear Erika,
>>>>
>>>> Are you talking about the whole genome 44k (or 4x44k) arrays?
>>>>
>>>> In our situation (with the arrays mentioned above), we have found those
>>>> replicate probes to behave in a virtually identical manner, to the
>>>> point
>>>> where we arbitrarily select one of the probes and simply ignore the
>>>> rest.
>>>>
>>>> As Gordon was saying, you can simply average the values. We have found
>>>> this not to be necessary, but it would definitely not hurt.
>>>>
>>>> I do not know of any package to use them for quality control. If you
>>>> see
>>>> one replicate that is really different from the others, you would
>>>> likely
>>>> worry about the array.
>>>>
>>>> Francois
>>>>
>>>> Gordon K Smyth wrote:
>>>>> Dear Erika,
>>>>>
>>>>> limma doesn't explicitly handle irregular replicates. (In my lab, we
>>>>> haven't had to work with any of the new generation of Agilent arrays
>>>>> yet, so haven't had to solve the issues with them.)
>>>>>
>>>>> Your best bet may be to simply average over the replicates for each
>>>>> probe, after normalisation, and before using lmFit(). This is not
>>>>> hard,
>>>>> but requires some programming in R.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>> On Tue, 3 Jun 2008, Erika Melissari wrote:
>>>>>
>>>>>> Dear Dr Smyth,
>>>>>>
>>>>>> I am a PhD student at University of Pisa. I frequently use LIMMA
>>>>>> package to handle gene expression microarray data. I have a question
>>>>>> about spot copies management by LIMMA. I know that LIMMA needs all
>>>>>> spots on the array are in the same number of copy ( e.g. each spot in
>>>>>> double ). In my research group It is just starting a project in wich
>>>>>> we use Agilent microarrays (so high density microarrays) and on these
>>>>>> arrays there is only a block of probes, positioned in a random
>>>>>> fashion, in more than one spot for each probe. Moreover there is not
>>>>>> the same number of copies for each probe in this block. Then we have
>>>>>> not regularly spaced replicate spots on the same array. Please, check
>>>>>> the gal file by human Agilent microarrays sent as Email
>>>>>> attachment, in
>>>>>> which I highlighted in red some spots (but not all...) to better
>>>>>> explain to you this situation. Is LIMMA able to manage this
>>>>>> situation?
>>>>>> That is, is LIMMA able to use this kind of random replicated spots to
>>>>>> perform a quality control procedure, to fit the linear model and to
>>>>>> produce a unique fold change value for this probe? Can I use any kind
>>>>>> of strategy to solve this problem? Does It exists a free package that
>>>>>> does this?
>>>>>>
>>>>>> Thank you very much for any information about this topic.
>>>>>>
>>>>>> Best Regards
>>>>>>
>>>>>> Erika Melissari
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG.
>>> Version: 8.0.100 / Virus Database: 269.24.6/1482 - Release Date:
>>> 4/6/2008 07:10 AM
>>>
>
>
> --------------------------------------------------------------------------------
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 8.0.100 / Virus Database: 270.0.0/1484 - Release Date: 4/6/2008
> 16:40 PM
>
More information about the Bioconductor
mailing list