[BioC] Problem getting the exact ProbeNames

Mon Jan 17 08:35:10 CET 2011

Dear all,

thanks for the great input so far. I now have to test it and understand 
it. If there are any problems remaining, I will let you know ;-)

Thanks and best whishes,

Karsten
>
> Hi Karsten,
>
> if you created an AffyBatch x with ReadAffy, then exprs(x) is a matrix 
> whose rows correspond to the probes on the array, one after the other 
> as they physically on the chip. The mapping between row-index in the 
> AffyBatch and (x,y)-coordinates is provided by the functions 
> indices2xy and xy2indices in the 'affy' package (whose code you can 
> see by typing their name). Essentially, it is very simple:
>
>     x = (i - 1) %% nr
>     y = (i - 1) %/% nr
> and in reverse:
>     i = x + 1 + nr * y
>
> where nr is the width of the chip. So one strategy is to compute the 
> (x,y) index of each probe on your array by
>
>     indices2xy(seq_len(nrow(mitdata)), abatch=mitdata)
>
> and use this to merge with your probe-sequence table. This might be 
> easier and more transparent than going through probeNames.
>
> Probe sequences for many Affymetrix chips are obtained through the 
> 'probe' packages (whose content is complementary to the smaller 'cdf' 
> packages):
>
>  library(hgu95av2probe)
>  head(as.data.frame(hgu95av2probe))
>
>
>     Best wishes
>     Wolfgang
>
>
> Karsten Voigt scripsit 12/01/11 15:28:
>> Hi all,
>>
>> On 01/11/2011 07:36 PM, James W. MacDonald wrote:
>>> Hi Karsten,
>>>
>>> On 1/11/2011 12:56 PM, Karsten Voigt wrote:
>>>> Dear all,
>>>>
>>>> I am currently working on a project where I need to get the exact 
>>>> IDs of
>>>> probes of a custom Affymetrix Chip in order to merge it with another
>>>> list containing the sequence.
>>>>
>>>> I am using this small R script for creating the list:
>>>>
>>>> mitdata <- ReadAffy();
>>>> stddata <- apply(pm(mitdata), 2, bg.adjust);
>>>> nrmdata <- normalize.quantiles(stddata);
>>>> namedata <- probeNames(mitdata);
>>>> enddata <- cbind(namedata, nrmdata);
>>>> write.table(enddata, file="probesdata.txt",sep="\t");
>>>>
>>>> This is an output example
>>>>
>>>> ...
>>>> 145 TZG_ARR_0001_x_at 135.115780787133 ...
>>>> 146 TZG_ARR_0001_x_at 147.346049115501 ...
>>>> 147 TZG_ARR_0001_x_at 203.840215898533 ...
>>>> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
>>>> ...
>>>>
>>>> As you can see, a number of probes have the same name but refer to
>>>> different oligos. The number in front of the row is just added by me,
>>>> therefore you can ignore it.
>>>>
>>>> I received a list containing the probe name, a couple of other
>>>> information AND the sequence.
>>>>
>>>> This is a part of it:
>>>>
>>>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
>>>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
>>>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
>>>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1
>>>>
>>>> This should be the same area.
>>>>
>>>> In this received list, I can identify the unique probes using the 2
>>>> numbers right after the exclamation mark, which are referring to the
>>>> position on the chip, I guess. How can I extract those coordinates for
>>>> my own list? I tried it with indices2xy, however I failed to get it
>>>> running since I don't understand how to use this function correctly.
>>>
>>> Using the hgu95av2cdf as an example:
>>>
>>> > library(hgu95av2cdf)
>>> > x <- as.list(hgu95av2cdf)
>>> > x <- x[order(names(x))]
>>> > x <- unlist(sapply(x, function(x) x[,1]))
>>> > xys <- indices2xy(x, cdf="hgu95av2cdf")
>>> > head(xys)
>>> x y
>>> 1000_at1 399 559
>>> 1000_at2 544 185
>>> 1000_at3 530 505
>>> 1000_at4 617 349
>>> 1000_at5 459 489
>>> 1000_at6 408 545
>>>
>>> Best,
>>>
>>> Jim
>>>
>>
>> first of all, many thanks to Jim for the quick and good answer. I runned
>> your script on my own cdf and it is exactly extracting what I am looking
>> for.
>>
>> However I still cannot identify the probes in my CEL-files loaded by the
>> ReadAffy() function. If I run probeNames on it, the probes will be
>> exported alphabetically. I cannot imagine that the CEL file probe values
>> are also sorted alphabetically in the way I gained it.
>>
>> I think my way of creating this list is wrong since it is highly
>> unlikely and impossible to prove that the probe names and the normalized
>> data are listed in the same order:
>>
>> How can I prove that the probeNames are fitting to the probe values? Is
>> it also possible to extract the x y values out of the cdf file?
>>
>> One other question: Is there any possibility to extract the sequence out
>> of the cdf file?
>>
>> Many thanks in advance again,
>>
>> Karsten
>>
>>
>
>

-- 
_________________________________________________
Karsten Voigt, Msc.
Experimentelle Bioinformatik, Hess Group
University of Freiburg, BIO III
t: 0761-2032708
m: 0176-61110420
e: karsten.voigt at biologie.uni-freiburg.de