[BioC] questions on the ImaGene data using limma package
Ming YI [Contr]
myi at ncifcrf.gov
Fri Oct 31 14:39:09 CET 2008
>Dear Gordon:
I re-posted my issue as below, I think I can not cc to "Bioconductor
mailing list"<bioconductor at stat.math.ethz.ch>, which is why I failed
to post it to the mailing list.
>Thanks a lot for your comments and suggestions. I already
>successfully read all the data into limma objects based on your
>suggestion using the generic method by using the attached target
>file I edited from their annotation file as I sent to you earlier. I
>did assume that the Cy3 channel is the common reference as you guessed.
>
>But the issue remained as you mentioned how actually they did the
>experiment. Based on their E-NCMF-8.idf.txt file from
>arrayExpress, it appears to be dye_swap_design, which is exactly
>what you guessed. So the data appears to be collated by ArrayExpress
>into data matrices with the Cy3 and Cy5 intensities in the same file
>for each sample. But the concern is in the column of "Label" in the
>file E-NCMF-8_sdrf.txt I sent to you in last email, what does those
>Cy3 and Cy5 mean for each sample, it looks like this column may tell
>for each sample (and corresponding raw data file), what is dye for
>the sample and the other dye would be used for the common reference,
>which was not mentioned in their annotation file. What do you think?
>if this is true, I may need to change my target file coordinately to
>accommodate this information. This assumption makes more sense at
>least to explain the repeated samples in the dataset, which should
>be the dye-swapping data.
>
>I tried to contact with them for details of the experiment design,
>that should help to sort this out.
>
>By the way, I am not sure why my post not go to the mailing list. I
>changed a bit the address this time, hope it works.
>
>Thanks again for your help. Any additional suggestion would be
>appreciated as well.
>
>Best regards,
>
>Ming Yi
ABCC
P.O.Box B, Bldg 430
National Cancer Institute/SAIC-Frederick, Inc
Frederick,Maryland
USA
>At 09:25 PM 10/29/2008, Gordon K Smyth wrote:
>>Dear Ming,
>>Thank you for mailing me example data sets and the annotation
>>spreadsheet from ArrayExpress.
>>You are assuming that the data from ArrayExpress are in ImaGene
>>format. This is incorrect. The reason that limma gives a special
>>treatment to ImaGene files is that, unlike other image analysis
>>software, ImaGene writes the Cy3 and Cy5 channels into separate
>>files. However ArrayExpress has collated the original data into
>>data matrices with the Cy3 and Cy5 intensities in the same file for
>>each sample. Therefore you should ignore all references to ImaGene
>>in the limma manual, and instead use the instructions for generic
>>two-color platforms.
>>The data sets you sent me can easily be read into limma using the
>>instructions in the limma User's Guide starting page 14 "What
>>should you do if your image analysis program is not in the above
>>list?" I demonstrate this below.
>>Your emails suggest that you have not yet read any two-color data
>>into limma. It is essential that you try some simple examples
>>before trying a large dataset from ArrayExpress, which will have a
>>complex structure you might not fully understand.
>>I don't fully understand the sample annotation file from
>>ArrayExpress that you sent me, but I doubt that you are
>>interpretting it correctly. It is not in the format you need for a
>>limma targets file. My guess is that each row of the file
>>corresponds to one array, and that each array has been hybridized
>>with a common reference that is not mentioned in the annotation
>>file. This means that the repeated sample names you have noted do
>>not represent matched Cy3 and Cy5 channels, but rather represent
>>dye-swap technical replicates. That is, they are separate arrays.
>>If my guess is correct, then a targets file would be something like below.
>>Let me emphasize that I do not offer a plug-in service to read
>>experimental data posted to ArrayExpress. It is your
>>responsibility to figure out the experimental design and the
>>ArrayExpression data formats. I am just guessing.
>>Best wishes
>>Gordon
>>
>>READING YOUR DATA FILES
>>
>>>f
>>[1] "E-NCMF-8-raw-data-1363346838.txt" "E-NCMF-8-raw-data-1363346856.txt"
>>
>>>ann <- c("Database NCMF:DB:omadhuman","Database
>>ebi.ac.uk:Database:ens_trscrpt_id","Feature coordinates:
>>metaColumn","metaRow","column","row","Reporter
>>identifier","Reporter sequence type")
>>
>>>columns <- list(Rf="ImaGene:Signal Mean_Cy5",Rb="ImaGene:Background
>>Median_Cy5",Gf="ImaGene:Signal Mean_Cy3",Gb="ImaGene:Background Median_Cy3")
>>
>>>RG <- read.maimages(files=f,annotation=ann,columns=columns)
>>Read E-NCMF-8-raw-data-1363346838.txt
>>Read E-NCMF-8-raw-data-1363346856.txt
>>
>>>dim(RG)
>>[1] 37632 2
>>
>>A POSSIBLE TARGETS FILE
>>
>>>targets <- readTargets()
>>>targets
>> Source DiseaseState
>> ArrayDataMatrixFile Cy3 Cy5
>>1 3560 Squamous Cell Carcinoma
>>E-NCMF-8-raw-data-1363346838.txt Reference SCC3560
>>2 reference pool of 61 HNSCC Squamous Cell Carcinoma
>>E-NCMF-8-raw-data-1363346856.txt Reference PoolHNSCC
>>
>>On Wed, 29 Oct 2008, Ming YI [Contr] wrote:
>>
>>>Hi, Dear Gordon:
>>>I tried to use limma to deal with ImaGene dataset I downloaded
>>>from ArrayExpress. I never deal with ImaGene data before and not
>>>familiar with ImaGene data format except knowing that the Cy5 and
>>>Cy3 signals are stored in two separate files for the same sample.
>>>I tried to read the data into limma and normalize them in the
>>>context of limma. and I keep running into issues and errors. and I
>>>wish you can help me with this regard:
>>>I did attach a file (E-NCMF-8_sdrf.txt) that was download from
>>>ArrayExpress can be potentially used for making the target file,
>>>and also I attached two raw data files of the ImaGene dataset as
>>>examples. The thing bothering me is as followed:
>>>Extract 3538 and Extract 3526 (see column "Extract Name" of
>>>E-NCMF-8_sdrf.txt file) , they do have one Cy5 and one matched Cy3
>>>files, so that's fine with me. but in particular, for "Extract
>>>reference pool of 61 HNSCC" (see E-NCMF-8_sdrf.txt file), there
>>>are multiple Cy3 and Cy5 for such samples, how should we
>>>incorporate that into the target file?
>>>I intended to use the following code to deal with this ImaGene data
>>>targets<-readTargets()
>>>files<-targets[,c("FileNameCy3", "FileNameCy5")'
>>>RG<-read.maimages(files, source="imagene")
>>>but I need the right target file to start with particularly with
>>>the issue I mentioned above.
>>>Also for normalization, the
>>>RG<-backgroundCorrect(RG, method="normexp", offset=50) still
>>>appropiate for ImaGene data?
>>>Thanks so much for your help!
>>>Ming Yi
>>>ABCC
>>>P.O.Box B, Bldg 430
>>>National Cancer Institute/SAIC-Frederick, Inc
>>>Frederick,Maryland
>>>USA
More information about the Bioconductor
mailing list