[BioC] Problem getting the exact ProbeNames

Karsten Voigt karsten.voigt at biologie.uni-freiburg.de
Wed Jan 12 15:28:42 CET 2011


Hi all,

On 01/11/2011 07:36 PM, James W. MacDonald wrote:
> Hi Karsten,
>
> On 1/11/2011 12:56 PM, Karsten Voigt wrote:
>> Dear all,
>>
>> I am currently working on a project where I need to get the exact IDs of
>> probes of a custom Affymetrix Chip in order to merge it with another
>> list containing the sequence.
>>
>> I am using this small R script for creating the list:
>>
>> mitdata <- ReadAffy();
>> stddata <- apply(pm(mitdata), 2, bg.adjust);
>> nrmdata <- normalize.quantiles(stddata);
>> namedata <- probeNames(mitdata);
>> enddata <- cbind(namedata, nrmdata);
>> write.table(enddata, file="probesdata.txt",sep="\t");
>>
>> This is an output example
>>
>> ...
>> 145 TZG_ARR_0001_x_at 135.115780787133 ...
>> 146 TZG_ARR_0001_x_at 147.346049115501 ...
>> 147 TZG_ARR_0001_x_at 203.840215898533 ...
>> 148 TZG_ARR_0003_x_at 48.7635207480323 ...
>> ...
>>
>> As you can see, a number of probes have the same name but refer to
>> different oligos. The number in front of the row is just added by me,
>> therefore you can ignore it.
>>
>> I received a list containing the probe name, a couple of other
>> information AND the sequence.
>>
>> This is a part of it:
>>
>> 15 ggagattgtttgtaatcaaaatgaa TGZ_ARR_0001_x ! 2398 0 176 200 + 1
>> 103 gcaaatttacttctaacagctgatc TGZ_ARR_0001_x ! 2398 1 264 288 + 1
>> 188 ttgatgcaactgtaaacaaaagtgg TGZ_ARR_0001_x ! 2398 2 349 373 + 1
>> 15 gatagattcttcaagtaacaatact TGZ_ARR_0003_x ! 2400 0 2046 2070 + 1
>>
>> This should be the same area.
>>
>> In this received list, I can identify the unique probes using the 2
>> numbers right after the exclamation mark, which are referring to the
>> position on the chip, I guess. How can I extract those coordinates for
>> my own list? I tried it with indices2xy, however I failed to get it
>> running since I don't understand how to use this function correctly.
>
> Using the hgu95av2cdf as an example:
>
> > library(hgu95av2cdf)
> > x <- as.list(hgu95av2cdf)
> > x <- x[order(names(x))]
> > x <- unlist(sapply(x, function(x) x[,1]))
> > xys <- indices2xy(x, cdf="hgu95av2cdf")
> > head(xys)
>            x   y
> 1000_at1 399 559
> 1000_at2 544 185
> 1000_at3 530 505
> 1000_at4 617 349
> 1000_at5 459 489
> 1000_at6 408 545
>
> Best,
>
> Jim
>

first of all, many thanks to Jim for the quick and good answer. I runned 
your script on my own cdf and it is exactly extracting what I am looking 
for.

However I still cannot identify the probes in my CEL-files loaded by the 
ReadAffy() function. If I run probeNames on it, the probes will be 
exported alphabetically. I cannot imagine that the CEL file probe values 
are also sorted alphabetically in the way I gained it.

I think my way of creating this list is wrong since it is highly 
unlikely and impossible to prove that the probe names and the normalized 
data are listed in the same order:

How can I prove that the probeNames are fitting to the probe values? Is 
it also possible to extract the x y values out of the cdf file?

One other question: Is there any possibility to extract the sequence out 
of the cdf file?

Many thanks in advance again,

Karsten


-- 
_________________________________________________
Karsten Voigt, Msc.
Experimentelle Bioinformatik, Hess Group
University of Freiburg, BIO III
t: 0761-2032708
m: 0176-61110420
e: karsten.voigt at biologie.uni-freiburg.de



More information about the Bioconductor mailing list