[BioC] normalization of illumina bead array data
michele caseposta
mic.cipi at gmail.com
Wed Jan 16 20:44:52 CET 2013
Hi Gordon,
thanks for your help. I am using backgroundCorrect(...) then.
Should I normalize the data anyway after the background correction?
If so, what would be the best way to do it?
On Jan 15, 2013, at 7:07 PM, Gordon K Smyth wrote:
> Dear Michele,
>
> On Tue, 15 Jan 2013, michele caseposta wrote:
>
>> Hi Wei,
>> I checked the data, and every array has probes with inconsistent detection value.
>
> That's what I suspected from your emails. If one array was affected, it seemed logical that all would be.
>
>> At this point I do not know if I should trust the data at all. Do you think that removing just the inconsistent probes would suffice?
>
> Definitely not. Whatever the creators of the data done, whether mis-sorting the detection values, or pre-processing the expression values being presented in the raw file in some way, it is likely to have affected the entries for all the probes in some way. So I would not personally trust the detection values at all.
>
> In limma, it may be wiser to use backgroundCorrect(method="normexp") instead of neqc().
>
> Best wishes
> Gordon
>
>> (the submitters made it clear that they DO NOT want to take care of this)
>>
>> Thanks for your help,
>> Michele
>>
>>
>>
>> On Jan 12, 2013, at 12:59 AM, Wei Shi wrote:
>>
>>> Dear Michele,
>>>
>>> Their data on ArrayExpress are not in beadstudio format, therefore they could not be loaded into beadstudio to test whether there are errors or not.
>>>
>>> The link below points to the Illumina user guide which describes how detection p values are calculated (page 106).
>>>
>>> http://support.illumina.com/documents/MyIllumina/c94519f7-9348-4308-a32f-b66ff3959e99/GenomeStudio_GX_Module_v1.0_UG_11319121_RevA.pdf
>>>
>>> Hope this helps.
>>>
>>> Cheers,
>>> Wei
>>>
>>> On Jan 12, 2013, at 7:59 AM, michele caseposta wrote:
>>>
>>>> Dear Wei,
>>>> I am sorry to bother you again.
>>>> I contacted the authors that produced the data to ask help with the dataset, but they were not collaborative.
>>>> All they told me was "we used beadstudio and it gave us no errors".
>>>> This aside, I would like to know more about this relation between intensity and detection score. Is it a fixed relation as you point out? If so, is there a place where I can read more about it? Is it possible that a probe A, composed by more beads is more reliable than a probe B with less beads, even though the intensity of A is lower than the intensity of B?
>>>> Thanks,
>>>> Michele
>>>>
>>>> On Jan 2, 2013, at 1:03 AM, Wei Shi wrote:
>>>>
>>>>> Dear Michele,
>>>>>
>>>>> I had a close look at the data used in your analysis and found that the detection data for some arrays seem to be wrong.
>>>>>
>>>>> With illumina bead array data, probes with larger intensities should have a equal or higher detection score (or equal or lower detection p value) than probes with lower intensities. However, this is not the case for some of the arrays in this dataset. The second column in your 'maqc' object is one of such arrays. My code below found 325 probes which had larger intensities but smaller detection scores:
>>>>>
>>>>>> tmp_sel <- !duplicated(maqc$E[,2])
>>>>>> d2e <- maqc$E[tmp_sel,2]
>>>>>> d2d <- maqc$other$Detection[tmp_sel,2]
>>>>>> d2ds <- d2d[order(d2e)]
>>>>>> sum(d2ds[-1] - d2ds[-c(length(d2ds))] < 0 )
>>>>> [1] 325
>>>>>
>>>>> This is the reason why negative values were calculated for sigma. It is not the problem of normexp.fit.detection.p function, but the problem of the data.
>>>>>
>>>>> You can contact the data submitter to let him/her correct this.
>>>>>
>>>>> Let me know if we could be of any further assistance.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Wei
>>>>>
>>>>>
>>>>> On Dec 23, 2012, at 12:45 PM, Michele wrote:
>>>>>
>>>>>> I am trying to process the raw data downloaded from:
>>>>>> http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-380
>>>>>>
>>>>>> At the moment of using the function neqc I get the following error:
>>>>>>
>>>>>> Error in if (sigma <= 0) stop("sigma must be positive") :
>>>>>> missing value where TRUE/FALSE needed
>>>>>>
>>>>>> The problem seems to be in this line:
>>>>>>
>>>>>> In sqrt(weighted.mean(v, freq) * n/(n - 1))
>>>>>>
>>>>>> of the function normexp.fit.detection.p
>>>>>>
>>>>>> This is generated by the fact that in this function, the difference among p-values is computed, and some of those differences turn out to be negative.
>>>>>>
>>>>>> Following is the code with which I am trying to process the data.
>>>>>>
>>>>>> library(rstudio)
>>>>>> library(beadarray)
>>>>>> library(limma)
>>>>>>
>>>>>> sample.name <- strsplit(dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), ".txt")
>>>>>> group <- sapply(sample.name, function(x) ifelse(length(grep("RR",x))>0,"MT","WT"))
>>>>>>
>>>>>> setwd("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/")
>>>>>> maqc <- read.ilmn(files=dir("~/Workspaces/data/E-MTAB-380/E-MTAB-380.raw.1/"), probeid = "Reporter name", other.columns = c("Detection", "Avg_NBEADS"))
>>>>>>
>>>>>> colnames(maqc$E) <- sample.name
>>>>>> colnames(maqc$other$Detection) <- sample.name
>>>>>> colnames(maqc$other$Avg_NBEADS) <- sample.name
>>>>>> maqc$targets <- unlist(sample.name)
>>>>>>
>>>>>> maqc.norm <- neqc(maqc, detection.p='Detection')
>>>>>>
>>>>>> How can I overcome this?
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>>
>>>>> ______________________________________________________________________
>>>>> The information in this email is confidential and intended solely for the addressee.
>>>>> You must not disclose, forward, print or use it without the permission of the sender.
>>>>> ______________________________________________________________________
>>>>
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the addressee.
>>> You must not disclose, forward, print or use it without the permission of the sender.
>>> ______________________________________________________________________
>>
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}
More information about the Bioconductor
mailing list