[BioC] codelink analysis

Diego Diez diez at kuicr.kyoto-u.ac.jp
Wed Aug 27 04:26:15 CEST 2008


Dear Lixia,

On Wed, Aug 27, 2008 at 7:00 AM, Diao,Lixia <ldiao at mdanderson.org> wrote:
> Dear Dr. Diez,
>
> I am trying to use codelink package to do analysis with Codelink arrays. It is indicated that codelink package only recognizes the text files exported from the Codelink software. We have 32 txt files, when I used :
> files<-list.files(pattern="txt")
> data<-readCodelink(files=files)
>
> it returns:
>
> Error in readHeader(files[n], dec = TRUE) : Not a Codelink exported file.
>
> The first several rows of files are:
>
> PROJECT
> EXPERIMENT
> SAMPLE  Sample001
> DATE    2007-10-26T17:19:48
>
>
> GENEID  NCBI_ACCESSION  TYPE_FLAG       LO_ARRAY_ID     EXPRESSIONVALUE NORMALIZEDEXPRESSIONVALUE       GENESPRINGFLAG  CODELINKFLAG    SPOT_COL        SPOT_ROW        SPOTMEAN        SPOTMEDIAN      BKGMEAN BKGMEDIAN
> .......
>
>
> I emailed to the person from the company. They claimed that this txt file is from Codelink
> system directly without any manipulation. I checked the code of readCodelink, it seems
> there should be product, number of genes..... fields. Would you like to help me about this?
> Will the file needs more header to be recognized by the codelink package?

Well, this looks like data coming from a codelink analysis but not in
the expected format that should be something similar to:

<--- file start --->
CodeLink Expression Analysis 4.1.0.29054
D Diez Report for Slide (T00298850)
LAYOUT  EXP287X128-950.22.ID
PROJECT RATA BEATRIZ
EXPERIMENT
PRODUCT Rat Whole Genome
Sample Name Array 1     T3-5(3)
Median Array 1  31,0905990600586
Report( 1 ): Adultos
--------------------------------------------------------------------------------
Idx     Probe_name      Probe_type      Feature_id      Raw_intensity   Normaliz
ed_intensity    Quality_flag    Signal_strength Logical_row     Logical_col
Center_X        Center_Y        Spot_mean       Spot_median     Spot_stdev
Spot_area       Spot_diameter   Spot_noise_level        Bkgd_mean       Bkgd_med
ian     Bkgd_stdev      Bkgd_area       Array   Sample_name
1       GE200017        FIDUCIAL        1001    1614,4359       51,9268 G
37,7907 1       1       135     137     1645,4359       1622,0000       1261,696
4       119     12,3091630935669        43,5407773523152        32,7917 31,0000
8,3605  124     1       T3-5(3)
<--- snip --->

As you see, in your case all fields are uppercase. And some field
names don't correspond. There are some mandatory fields but the header
is mostly unnecessary, although it is used to check the file format
(in particular the first line). One possibility is that your format
correspond to a new codelink format, since codelink passed from GE
Healthcare to Applied i haven't being able to follow the potential
changes they may have introduced into the file format. In particular,
the new Codelink Software (v5) which is used to export the text files
may have changed the original format. On the other hand, the field
GENESPRINGFLAG looks suspicious, like the data was processed with
GeneSpring, but again i am just guessing. Did you just copy/paste the
header sample you included?

In case your data corresponds to a genuinely true codelink format i
will try to give support for it but i will need some extra
information.


Best,
Diego.



-- 
Diego Diez, Ph
Bioinformatics center,
Institute for Chemical Research,
Kyoto University.
Gokasho, Uji, Kyoto 611-0011 JAPAN
diez at kuicr.kyoto-u.ac.jp



>
> Thanks a lot,
> Lixia
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list