[BioC] Illumina Probe_ID used in the LIMMA package for neqc function
Wil D'Avigdor
w.davigdor at centenary.org.au
Tue Jul 9 09:21:09 CEST 2013
Hi Wei,
For probe filtering, I have been using a p-value cut-off of p=0.01 with at least one sample passing this threshold across my data set, which reduces the number of probes from 48,701 to 16,877.
I would like to confirm that this is the suitable threshold for my analyses?
Many thanks in advance,
Wil
Sent from my iPhone
On 04/07/2013, at 6:37 PM, Wei Shi <shi at wehi.EDU.AU> wrote:
> Hi William,
>
> Please keep the posts on the list.
>
> You should certainly remove from analysis those probes which do not express in any of your samples, ie keeping only the probes which express in at least one sample. You can do so by applying a detection p value cutoff (eg 0.05 or 0.01) or you may run the propexpr function to estimate the proportion of expressed probes and then use that information to filter out probes. See ?propexpr for more details.
>
> Best wishes,
>
> Wei
>
> On Jul 4, 2013, at 2:55 PM, William D'Avigdor wrote:
>
>> Hi Wei,
>>
>> Many thanks for your response.
>>
>> I would like to ask you another question, specifically about probe filtering.
>>
>> So far I have performed all my analyses on UNFILTERED Illumina data from Genome Studio. Is it still VALID for Illumina data to use unfiltered data in contrast to filtered probes (comparing to background signal) with a particular p-value (eg p=0.01, or 0.1 according to your paper: Illumina WG-6 BeadChip strips should be normalised separately).
>>
>> I am assuming when performing hierachical clustering on the full data, the genes at background level will not significantly contribute to the clustering. However, I do notice that the clustering distance is narrowed obviously because the samples appear closer than they otherwise would.
>>
>> Further, when performing t-tests / LIMMA on the full data, those genes that are close to background level should not contribute to significant differences across groups. Is this correct? And is there anything I am missing out on? Apart from maybe a contribution by FDR.
>>
>> Many thanks,
>> Wil
>>
>> On 2/07/2013 7:18 PM, Wei Shi wrote:
>>> Dear William,
>>>
>>> What you have done is correct. As you have found, the 'ProbeID' is the same as the Array_Address_ID. The 'ProbeID' column was used in the old versions of Illumina BeadChip arrays, and it was later replaced with 'PROBE_ID" in the newer versions of BeadChips.
>>>
>>> The neqc() function uses negative control probes to carry out background correction. The 'TargetID' column in the control probe profile file indicates the types of control probes and the negative control probes have the type of 'NEGATIVE'. Neqc also uses all the probes including regular probes and all types of control probes (negative controls, housekeeping, ...) to perform a quantile between-array normalization.
>>>
>>> Best wishes,
>>>
>>> Wei
>>>
>>> On Jul 2, 2013, at 3:56 PM, William D'Avigdor wrote:
>>>
>>>> Hi,
>>>>
>>>> I am doing some Illumina analysis using HumanWG-6_V2 microarrays, and have been using the annotation file: HumanWG-6_V2_0_R4_11223189_A.bgx, and I am normalising using the NEQC function in the LIMMA package.
>>>>
>>>> I know there are traditionally a number of Illumina identifiers and I am concerned that I may have potentially been using the wrong ones, and I'm not sure whether this has affected the normalisation proceedure, or anything at all.
>>>>
>>>> After summarisation in Genome Studio, when looking at the 'Sample Probe Profile', the main identifiers that come up (and which I have used in LIMMA) are 'PROBE_ID' and 'SYMBOL', the first row being ILMN_1762337 and 7A5 respectively. I also noticed that this PROBE_ID column was the one used in the Illumina example in the LIMMA manual.
>>>>
>>>> HOWEVER, in Genome Studio, there is also a column called 'ProbeID'. This does not exist in the original annotation file (HumanWG-6_V2_0_R4_11223189_A), but it is identical to the Array_Address_ID (except for the preceeding 000s), the latter of which is both in Genome Strudio and in the Annotation file, and UNIQUE to the version of the microarray.
>>>>
>>>> IN CONTRAST, in the 'Control Probe Profile' in Genome Studio, there is only the 'TargetID' and the 'ProbeID' available, the latter of which (I believe) is the Array_Address_ID?
>>>>
>>>> HENCE, for the LIMMA input, I am wondering whether I am correct when I have included the Sample Probe ID text file (which includes PROBE_ID, that is, ILMN_1762337), and the Control Probe ID text file (which includes ProbeID instead, which is most likely the Array Address ID).
>>>>
>>>> Many thanks in advance,
>>>> William d'Avigdor
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the addressee.
>>> You must not disclose, forward, print or use it without the permission of the sender.
>>> ______________________________________________________________________
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}
More information about the Bioconductor
mailing list