[BioC] questions on the ImaGene data using limma package

Fri Oct 31 14:39:09 CET 2008

>Dear Gordon:

I re-posted my issue as below, I think I can not cc to "Bioconductor 
mailing list"<bioconductor at stat.math.ethz.ch>, which is why I failed 
to post it to the mailing list.

>Thanks a lot for your comments and suggestions. I already 
>successfully read all the data into limma objects based on your 
>suggestion using the generic method by using the attached target 
>file I edited from their annotation file as I sent to you earlier. I 
>did assume that the Cy3 channel is the common reference as you guessed.
>
>But the issue remained as you mentioned how actually they did the 
>experiment. Based on their E-NCMF-8.idf.txt file from 
>arrayExpress,  it appears to be dye_swap_design, which is exactly 
>what you guessed. So the data appears to be collated by ArrayExpress 
>into data matrices with the Cy3 and Cy5 intensities in the same file 
>for each sample. But the concern is in the column of "Label" in the 
>file E-NCMF-8_sdrf.txt  I sent to you in last email, what does those 
>Cy3 and Cy5 mean for each sample, it looks like this column may tell 
>for each sample (and corresponding raw data file), what is dye for 
>the sample and the other dye would be used for the common reference, 
>which was not mentioned in their annotation file. What do you think? 
>if this is true, I may need to change my target file coordinately to 
>accommodate this information. This assumption makes more sense at 
>least to explain the repeated samples in the dataset, which should 
>be the dye-swapping data.
>
>I tried to contact with them for details of the experiment design, 
>that should help to sort this out.
>
>By the way, I am not sure why my post not go to the mailing list. I 
>changed a bit the address this time, hope it works.
>
>Thanks again for your help. Any additional suggestion would be 
>appreciated as well.
>
>Best regards,
>
>Ming Yi
ABCC
P.O.Box B, Bldg 430
National Cancer Institute/SAIC-Frederick, Inc
Frederick,Maryland
USA

>At 09:25 PM 10/29/2008, Gordon K Smyth wrote:
>>Dear Ming,
>>Thank you for mailing me example data sets and the annotation 
>>spreadsheet from ArrayExpress.
>>You are assuming that the data from ArrayExpress are in ImaGene 
>>format. This is incorrect.  The reason that limma gives a special 
>>treatment to ImaGene files is that, unlike other image analysis 
>>software, ImaGene writes the Cy3 and Cy5 channels into separate 
>>files.  However ArrayExpress has collated the original data into 
>>data matrices with the Cy3 and Cy5 intensities in the same file for 
>>each sample.  Therefore you should ignore all references to ImaGene 
>>in the limma manual, and instead use the instructions for generic 
>>two-color platforms.
>>The data sets you sent me can easily be read into limma using the 
>>instructions in the limma User's Guide starting page 14 "What 
>>should you do if your image analysis program is not in the above 
>>list?"  I demonstrate this below.
>>Your emails suggest that you have not yet read any two-color data 
>>into limma.  It is essential that you try some simple examples 
>>before trying a large dataset from ArrayExpress, which will have a 
>>complex structure you might not fully understand.
>>I don't fully understand the sample annotation file from 
>>ArrayExpress that you sent me, but I doubt that you are 
>>interpretting it correctly.  It is not in the format you need for a 
>>limma targets file.  My guess is that each row of the file 
>>corresponds to one array, and that each array has been hybridized 
>>with a common reference that is not mentioned in the annotation 
>>file.  This means that the repeated sample names you have noted do 
>>not represent matched Cy3 and Cy5 channels, but rather represent 
>>dye-swap technical replicates.  That is, they are separate arrays.
>>If my guess is correct, then a targets file would be something like below.
>>Let me emphasize that I do not offer a plug-in service to read 
>>experimental data posted to ArrayExpress.  It is your 
>>responsibility to figure out the experimental design and the 
>>ArrayExpression data formats. I am just guessing.
>>Best wishes
>>Gordon
>>
>>READING YOUR DATA FILES
>>
>>>f
>>[1] "E-NCMF-8-raw-data-1363346838.txt" "E-NCMF-8-raw-data-1363346856.txt"
>>
>>>ann <- c("Database NCMF:DB:omadhuman","Database
>>ebi.ac.uk:Database:ens_trscrpt_id","Feature coordinates: 
>>metaColumn","metaRow","column","row","Reporter 
>>identifier","Reporter sequence type")
>>
>>>columns <- list(Rf="ImaGene:Signal Mean_Cy5",Rb="ImaGene:Background
>>Median_Cy5",Gf="ImaGene:Signal Mean_Cy3",Gb="ImaGene:Background Median_Cy3")
>>
>>>RG <- read.maimages(files=f,annotation=ann,columns=columns)
>>Read E-NCMF-8-raw-data-1363346838.txt
>>Read E-NCMF-8-raw-data-1363346856.txt
>>
>>>dim(RG)
>>[1] 37632     2
>>
>>A POSSIBLE TARGETS FILE
>>
>>>targets <- readTargets()
>>>targets
>>                       Source            DiseaseState 
>> ArrayDataMatrixFile Cy3       Cy5
>>1                       3560 Squamous Cell Carcinoma 
>>E-NCMF-8-raw-data-1363346838.txt Reference   SCC3560
>>2 reference pool of 61 HNSCC Squamous Cell Carcinoma 
>>E-NCMF-8-raw-data-1363346856.txt Reference PoolHNSCC
>>
>>On Wed, 29 Oct 2008, Ming YI [Contr] wrote:
>>
>>>Hi, Dear Gordon:
>>>I tried to use limma to deal with ImaGene dataset I downloaded 
>>>from ArrayExpress. I never deal with ImaGene data before and not 
>>>familiar with ImaGene data format except knowing that the Cy5 and 
>>>Cy3 signals are stored in two separate files for the same sample. 
>>>I tried to read the data into limma and normalize them in the 
>>>context of limma. and I keep running into issues and errors. and I 
>>>wish you can help me with this regard:
>>>I did attach a file (E-NCMF-8_sdrf.txt) that was download from 
>>>ArrayExpress can be potentially used for making the target file, 
>>>and also I attached two raw data files of the ImaGene dataset as 
>>>examples. The thing bothering me is as followed:
>>>Extract 3538  and Extract 3526 (see column "Extract Name" of 
>>>E-NCMF-8_sdrf.txt file) , they do have one Cy5 and one matched Cy3 
>>>files, so that's fine with me. but in particular, for "Extract 
>>>reference pool of 61 HNSCC" (see E-NCMF-8_sdrf.txt file), there 
>>>are multiple Cy3 and Cy5 for such samples, how should we 
>>>incorporate that into the target file?
>>>I intended to use the following code to deal with this ImaGene data
>>>targets<-readTargets()
>>>files<-targets[,c("FileNameCy3", "FileNameCy5")'
>>>RG<-read.maimages(files, source="imagene")
>>>but I need the right target file to start with particularly with 
>>>the issue I mentioned above.
>>>Also for normalization, the
>>>RG<-backgroundCorrect(RG, method="normexp", offset=50) still 
>>>appropiate for ImaGene data?
>>>Thanks so much for your help!
>>>Ming Yi
>>>ABCC
>>>P.O.Box B, Bldg 430
>>>National Cancer Institute/SAIC-Frederick, Inc
>>>Frederick,Maryland
>>>USA