[BioC] Integrating Codelink data with bioconductor (using affyand
limmafunctions)
Diego Díez Ruiz
ddiez at iib.uam.es
Mon Apr 25 15:15:22 CEST 2005
Gordon Smyth escribió:
> At 09:35 PM 25/04/2005, Diego Díez Ruiz wrote:
>
>> Dear Gordon,
>>
>> Thanks for your response. I will use the data as early but, What do
>> you think it could affect more to normalization process: Some points
>> assigned as NA values or some point with lowers A values as one of the
>> intensitues was assigned a value of say 0.01?
>
>
> Unless you're doing much more than I think you are, you must avoid NAs
> at all costs. If you have to live with low intensities, then so be it.
then so be it.
>
>> I'd let you see my class definition and parser of course. This is
>> really the first time a make use of classes and store all things as an
>> R package so I thought that the best way to make something usable and
>> quick without having to read completly "writting R extensions" was
>> using others packages to learn (that is one of the greatness of
>> opensource :). Of course I will have to read it one day.
>> Briefly:
>> 1. The parser read exported txt files from codelink software.
>
>
> I've never seen Codelink output, but my understanding is that it is
> essentially just ImaGene output. Is that not correct?
I've never seen Imagene output. This is header and column names from
codelink output:
CodeLink Expression Analysis 4.1.0.29054
CNIC Report for Slide (T00241792)
LAYOUT EXP294X192-912.22.ID
PROJECT
EXPERIMENT
PRODUCT Human Whole Genome
Sample Name Array 1 Sample001
Median Array 1 86,6547470092773
Report( 1 ): 310105-Person
--------------------------------------------------------------------------------
Idx Array Sample_name Probe_name Annotation_PIN Annotation_NCBI_Acc
Annotation_NCBI_NID Annotation_LocusLink Annotation_OGS
Annotation_UniGene Annotation_ENSEMBL Probe_type Feature_id
Raw_intensity Normalized_intensity Quality_flag Signal_strength
Logical_row Logical_col Center_X Center_Y Spot_mean Spot_median
Spot_stdev Spot_area Spot_diameter Spot_noise_level Bkgd_mean
Bkgd_median Bkgd_stdev Bkgd_area Annotation_Molecular_Function
Annotation_Biological_Process Annotation_Cellular_Component
Annotation_Cytoband Annotation_HS_Homology Annotation_MM_Homology
Annotation_RN_Homology Annotation_Analogous_CodeLink
Annotation_Legacy_Probe_Name Description
Header could be less than 10 rows (custom) and columns could be
customized (for example in my own data I avoid Annotation_* and
Description columns). I'm not sure if in this example there are all the
possible fields.
D
>
> Gordon
>
>> It works fine with 3 different chips so I think it should work fine
>> with all types. A problem is that exported text data have custom
>> fields (and you can chose within all fields including Raw_intensity,
>> Median_foreground, etc) So it could be possible to found files with
>> missing fields not exported. I know that it is possible to export as
>> XML but a didn't try that yet.
>>
>> 2. The class definition is very simple. I based it in RGlist and used
>> almost all redefinitions of dim() as.matrix() etc... that you use in
>> limma. I also based a subsetting system in the one used in AffyBatch
>> objects in affy. A Codelink object stores as a list 3 matrices. One of
>> intensities, one of Flags and one last with probe name and probe type.
>> I actually named it "val" "flags" and "info" slots but i don't thing
>> they are appropiate so this week I want to import all possible fields
>> and name it as they are called in the exported files. I probably too
>> make comprobation about the fields present and warn or error if a
>> *must have* field is missing.
>>
>> When I have a more clear and clean code I will not have any problems
>> in let you see it.
>>
>> D
>
>
More information about the Bioconductor
mailing list