[BioC] Some Genefilter questions
Amy Mikhail
a.mikhail at abdn.ac.uk
Thu Nov 30 22:35:47 CET 2006
Hi all,
Jenny, just wanted to clarify what you said; you reckon if I only want to
remove the foreign species probesets I should do this before
preprocessing, but if I want to remove e.g. absent calls from my own
species probes I should do this after preprocessing. Is this right?
Also, how do I create the character vector of my parasite probesets for
your code?
Robert, I tried subsetting after preprocessing but before analysis ... it
made no difference to the order of probesets, however the numbers changed
slightly (all the probesets had slightly higher adjusted P.values after
removing the parasite probes). See below:
(a) Toptable for full dataset:
ID M A t P.Value
adj.P.Val B
5808 Ag.2R.2004.0_CDS_at -1.870657 9.585064 -16.705963 2.730301e-07
0.006216623 4.207052
12128 Ag.3R.1526.1_a_at -1.129926 9.969329 -13.778759 1.140079e-06
0.010670646 3.731215
6675 Ag.2R.274.0_UTR_a_at -2.967667 9.851482 -13.392310 1.405944e-06
0.010670646 3.650675
6676 Ag.2R.274.1_CDS_a_at -1.871438 9.486805 -12.842425 1.913317e-06
0.010891076 3.526999
7614 Ag.2R.354.0_UTR_at -1.266767 8.481348 -11.394707 4.581189e-06
0.020119389 3.141374
4531 Ag.2L.992.0_CDS_at 2.026152 9.203893 11.167484 5.301785e-06
0.020119389 3.071661
7990 Ag.2R.424.0_CDS_a_at 1.240622 9.747394 10.326106 9.329289e-06
0.030345512 2.787711
7615 Ag.2R.354.16_a_at -2.045494 9.100215 -10.046394 1.135967e-05
0.032331041 2.683414
13171 Ag.3R.2423.0_CDS_at -0.962208 6.088883 -9.672024 1.489835e-05
0.032613809 2.535235
1233 Ag.2L.1092.1_a_at 0.967778 11.195894 9.604850 1.565626e-05
0.032613809 2.507552
3645 Ag.2L.387.0_CDS_at -1.291859 6.257007 -9.596269 1.575616e-05
0.032613809 2.503991
6674 Ag.2R.274.0_CDS_s_at -1.748227 8.217272 -9.022044 2.439458e-05
0.046286683 2.252335
(b) Toptable for dataset minus parasite probesets:
ID M A t P.Value
adj.P.Val B
5808 Ag.2R.2004.0_CDS_at -1.8706568 9.585064 -16.460263 4.609906e-07
0.008415383 4.22498712
12128 Ag.3R.1526.1_a_at -1.1299262 9.969329 -13.637285 1.764053e-06
0.013877872 3.73030514
6675 Ag.2R.274.0_UTR_a_at -2.9676671 9.851482 -13.144767 2.289137e-06
0.013877872 3.61989734
6676 Ag.2R.274.1_CDS_a_at -1.8714376 9.486805 -12.626803 3.040892e-06
0.013877872 3.49400490
7614 Ag.2R.354.0_UTR_at -1.2667670 8.481348 -11.227966 6.932125e-06
0.024830944 3.09513993
4531 Ag.2L.992.0_CDS_at 2.0261521 9.203893 10.968142 8.161362e-06
0.024830944 3.01011426
7990 Ag.2R.424.0_CDS_a_at 1.2406222 9.747394 10.167325 1.380828e-05
0.036010013 2.72261326
7615 Ag.2R.354.16_a_at -2.0454939 9.100215 -9.863084 1.702538e-05
0.038169133 2.60232832
13171 Ag.3R.2423.0_CDS_at -0.9622079 6.088883 -9.542971 2.135453e-05
0.038169133 2.46851929
1233 Ag.2L.1092.1_a_at 0.9677780 11.195894 9.475125 2.242393e-05
0.038169133 2.43915802
3645 Ag.2L.387.0_CDS_at -1.2918594 6.257007 -9.440086 2.299975e-05
0.038169133 2.42385347
6674 Ag.2R.274.0_CDS_s_at -1.7482273 8.217272 -8.858858 3.545759e-05
0.053939852 2.15526082
Why would the adjusted P values be higher in the second case (number of
parasite probes removed was about 4,000)?
Regards,
Amy
---------------------------------------------------------------------------
> Hi,
>
> It may be worth pointing out that a related question can have a huge
> impact on normalization of certain glass arrays. One of the standard
> protocols on the Agilent 44K human arrays causes several hundred control
> spots to light up extremely brightly in the green channel, but remain
> completely off in the red channel. If you leave these control spots in
> the data set when you normalize between channels (i.e., within arrays),
> every known normalization methods breaks -- in the precise sense that it
> will systematically distort the comparison between the red and green
> channels. If you then model the data incorporating a dye effect, you
> will think that almost every gene exhibits a dye bias. On the other
> hand, if you remove these control spots before normalizing between
> channels, then modeling the dye bias suggest that it rarely exists....
>
> As for the question originally asked here, I would not expect the
> foreign species probes to break the normalization (unless they somehow
> light up in one group of samples but not in the other). So, my own bias
> would be to keep them for background correction and normalization, but
> remove them before the rest of the analysis.
>
> Best,
> Kevin
>
> Jenny Drnevich wrote:
>> Hi Amy,
>>
>> Don't you just love it when you get one response suggesting you do one
>> thing (remove malarial genes after pre-processing) and another response
>> suggesting the opposite? Although I think in this case Robert was
>> suggesting you remove them after pre-processing because it was easier
>> than
>> trying to modify either the normalization code or the cdf environment,
>> which is what Jim pointed out to you. I ran into this same problem with
>> having probesets for other species on the soybean array, which is why I
>> used Ariel's code. I think that if you're using a mixed species array
>> but
>> only put one of the species on it, then you should remove the other
>> species' probesets BEFORE doing the normalization because they really
>> have
>> no bearing on the transcriptome you're trying to measure. On the other
>> hand, if you also want to filter your species' probesets based on
>> presence/absence, minimum cutoff, variation, etc.* , then you should
>> filter
>> these genes AFTER doing the pre-processing because these probesets do
>> contain information about the transcriptome, even if it is just 'not
>> detectably expressed'.
>>
>> Cheers,
>> Jenny
>>
>> * Contrary to Robert, I prefer to filter on presence/absence (using
>> Affy's
>> calls) rather than variability :) I don't know if there is any
>> documentation on which may be "better"...
>>
-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
00-44-1224-273256 (office)
More information about the Bioconductor
mailing list