[BioC] how to get probe ids??
James W. MacDonald
jmacdon at med.umich.edu
Thu Sep 15 15:40:11 CEST 2011
Hi Anand,
On 9/14/2011 11:46 AM, anand m t wrote:
> Hi all..
>
> I'm very new to microarray analysis.
> i've been given two datasets for analysis. (experimental and control
> with 3 replicates each)
> I've encountered following errors/problems..
>
> 1.whenever i tried to run mas5calls, it throws an error saying the
> presence of NA/Inf/NAN's in the data.
>
>> affy.data=ReadAffy()
>> data.mas5calls = mas5calls(affy.data)
> Getting probe level data...
> Computing p-values
> Error in FUN(1:6[[1L]], ...) :
> NA/NaN/Inf in foreign function call (arg 2)
>
> then i tried removing NA's using following command..
>
>> na.omit(affy.data)
> AffyBatch object
> size of arrays=1050x1050 features (18 kb)
> cdf=MoGene-1_0-st-v1 (35556 affyids)
> number of samples=6
> number of genes=35556
> annotation=mogene10stv1
> notes=
This problem arises because mas5calls() is a method for determining if
the perfect match (PM) probes are significantly different from the
mismatch (MM) probes. However, the mogene chip has no MM probes, so you
cannot compute mas5calls in the conventional sense.
In fact, the affy package isn't really designed to process the newer
version of Affy chips, so you are doing yourself a disservice by using
it. Instead you should be using either the oligo or xps package.
The oligo package will allow you to compute DAGB calls, which are the
successor to the mas5calls. The xps package will also allow you to do
this, and will even allow you to compute mas5calls. However, given that
there are no MM probes, it cannot be computing the conventional
mas5calls, so I would recommend sticking with DAGB.
>
> But even after, the same problem exists. How do i solve this??
>
> 2. I skipped this step and proceed with next step. I calculated
> p-values and extracted all statistically significant probes. But,
> look at my probe names (rma normalized data)
>
> probe_names control_1 control_2 control_3 experimental_1 experimental_2 experimental_3
> 10338001 11.70433113 11.09411799 11.17114406 12.3810603 11.3593078 11.30987883
> 10338002 7.455822379 7.022795366 6.977515221 7.863429983 6.659503501 6.583122799
> 10338003 9.944000269 9.329330062 9.439069933 10.87092521 9.507433404 9.421644356
> 10338004 8.807458574 8.190795944 8.336526249 9.666564028 8.555489147
>
> It doest have any probe extensions such as "_at", etc. what might be
> the problem?? How do i proceed now ??
The problem is that you aren't working with a 3'-biased array (which had
probeset IDs of that form). The Gene ST arrays just have numerical
probeset IDs, which is what you see there.
You proceed as normal with these arrays. Compute some sort of summary
statistic for each probeset, then fit whatever univariate linear model
you deem necessary, and extract the 'top' probesets for further exploration.
Note however that the Gene ST arrays are basically a subset of the Exon
ST arrays, so the notion of probeset is less fixed than it was with the
3'-biased arrays. In other words, you can define a probeset in one of
two different ways. There is a vignette in oligo that shows how to
process these chips
(http://www.bioconductor.org/packages/2.8/bioc/vignettes/oligo/inst/doc/V5ExonGene.pdf).
Best,
Jim
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list