[BioC] help with dataset

Wed May 27 15:17:39 CEST 2009

Hi Alberto:

In line of Vincent's second suggestion, you could read the phenoData 
while reading the data (.CEL files). After that, it would propagate from 
AffyBatch to expressionSet object. For some time, I have used this 
approach (below) which I would like to bounce off the list.

For example, there is a Targets.txt file:

Sample    Celfile    ES    TYPE
SHR.PUFA5    SHR-PUFA5.CEL    PUFA    SHR
SHR.PUFA6    SHR-PUFA6.CEL    PUFA    SHR
SHR.st7    SHR-st7.CEL    ST    SHR
SHR.st8    SHR-st8.CEL    ST    SHR
WK.PUFA3    WK-PUFA3.CEL    PUFA    WK
WK.PUFA4    WK-PUFA4.CEL    PUFA    WK
WK.st1    WK-st1.CEL    ST    WK
WK.st2    WK-st2.CEL    ST    WK

You read in this information
 > targets=readTargets()

Then create Phenodata object from this information.

The following create an object that is basically same as the targets
 > myCovs = data.frame(targets)
 > rownames(myCovs) = myCovs[,1]

Find the levels of each column in the targets.
 > nlev = as.numeric(apply(myCovs, 2, function(x) nlevels(as.factor(x))))

 > finally create the data.frame and AnnotatedDataFrame
metadata = data.frame(labelDescription = paste(colnames(myCovs), ": ", 
nlev, " level", ifelse(nlev==1,"","s"), sep=""),
                          row.names=colnames(myCovs))
phenoData = new("AnnotatedDataFrame", data=myCovs, varMetadata=metadata)

use the phenoData as an argument to ReadAffy
 > dat=ReadAffy(sampleNames=myCovs$Sample, filenames=myCovs$Celfile, 
phenoData=phenoData)

Then normalize.
 > eset = rma(dat)

I hope that works.

Saroj

Sample 	Celfile 	ES 	TYPE
SHR.PUFA5 	SHR-PUFA5.CEL 	PUFA 	SHR
SHR.PUFA6 	SHR-PUFA6.CEL 	PUFA 	SHR
SHR.st7 	SHR-st7.CEL 	ST 	SHR
SHR.st8 	SHR-st8.CEL 	ST 	SHR
WK.PUFA3 	WK-PUFA3.CEL 	PUFA 	WK
WK.PUFA4 	WK-PUFA4.CEL 	PUFA 	WK
WK.st1 	WK-st1.CEL 	ST 	WK
WK.st2 	WK-st2.CEL 	ST 	WK

Alberto Goldoni wrote:
>> Hello to everybody,
>> i have a little problem with my dataset. Actually my data using
>> pData(eset.irq.50) is like this:
>>
>>     
>>> eset.irq.50
>>>       
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 1227 features, 8 samples
>>   element names: exprs
>> phenoData
>>   sampleNames: SHR-PUFA5.CEL, SHR-PUFA6.CEL, ..., WK-st2.CEL  (8 total)
>>   varLabels and varMetadata description:
>>     sample: arbitrary numbering
>> featureData
>>   featureNames: 1367555_at, 1367556_s_at, ..., 1399089_at  (1227 total)
>>   fvarLabels and fvarMetadata description: none
>> experimentData: use 'experimentData(object)'
>> Annotation: rat2302
>>
>>
>>     
>>> pData(eset.irq.50)
>>>       
>>                                        sample
>> SHR-PUFA5.CEL             1
>> SHR-PUFA6.CEL             2
>> SHR-st7.CEL                   3
>> SHR-st8.CEL                   4
>> WK-PUFA3.CEL              5
>> WK-PUFA4.CEL              6
>> WK-st1.CEL                    7
>> WK-st2.CEL                    8
>>
>> i would like to modify these data in order to obtain:
>>
>>     
>>> pData(eset.irq.50)
>>>       
>>                                 ES              TYPE
>> SHR-PUFA5.CEL     PUFA          SHR
>> SHR-PUFA6.CEL     PUFA          SHR
>> SHR-st7.CEL           ST               SHR
>> SHR-st8.CEL           ST               SHR
>> WK-PUFA3.CEL      PUFA           WK
>> WK-PUFA4.CEL      PUFA           WK
>> WK-st1.CEL            ST                WK
>> WK-st2.CEL            ST                WK
>>
>> in order to gave to me the possibility to perform factDesign analysis.
>>
>> Somebody can help me?
>>
>>
>> BEST REGARDS.
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Dr. Alberto Goldoni
>> Bologna, Italy
>> -----------------------------------------------------
>>
>>     
>
>
>
>