[BioC] different gal files using limma

Tue Sep 11 23:26:52 CEST 2007

Dear Dr. Symth,

Thanks for your help. I read in the gpr files using 2 gal files 
separately, then found the spot types separately, normalization 
separately, and remove all control spots separately, and only keep gene 
type for further analysis. Both MA1 and MA2 used the same gene ID s, 
however, MA2$genes$ID have 8 more genes than MA1. I used your code to 
match MA1 to MA2

m <- match(MA2$genes$ID, MA1$genes$ID)
MA <- cbind(MA1[m,], MA2)

I compared MA2 to MA2 part of MA, the numbers are identical, however, 
there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID. 
Because MA1 and MA2 aren't the same length and IDs. Could I still use it? 
There are 4 duplicate spots per gene on the array.

I put 2 target files together to create a new target file, and use it to 
build design matrix for linear model. Is it OK?

Sincerely,

Tiandao

On Tue, 11 Sep 2007, Gordon Smyth wrote:

Dear Tiandao,

Dealing with multiple gal files is very tricky, but possible. In limma, you need
to read in the GPR files for each GAL file separately, identify control spots
separately, and normalize separately. So, if you have two GAL files, you will
end up with two normalized MAList objects MA1 and MA2.

You will then need to align MA1 and MA2 by gene ID. There is a merge command,
but very often the situation is too complex for this command to handle. Usually
you will need to remove the control spots from MA1 and MA2 separately, to get
down to a list of common genes, then sort MA1 to match the gene order of MA2,
then cbind them together.

If MA1 and MA2 are of the same length, with the same gene IDs, then something
like this wil do the merge:

   m <- match(MA2$genes$ID, MA1$genes$ID)
   MA <- cbind(MA1[m,], MA2)

There is any alternative method, which is to use the printorder() function to
map spots back to the original 384-well plate positions, then align the arrays
by 384-well plate. This method requires that the plates were used in the same
order throughout the printing, except for control plates.

You need to be very careful!
Good luck.
Gordon

> Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT)
> From: Tiandao Li <Tiandao.Li at usm.edu>
> Subject: [BioC] different gal files using limma
> To: Bioconductor_help <bioconductor at stat.math.ethz.ch>
> Message-ID: <Pine.LNX.4.64.0709091401440.32134 at orca.st.usm.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> 
> Hello,
> 
> I am analyzing cDNA microarray data using limma. I generated the GAL file
> using the program coming with chipwriter, everything looks great. However,
> when I printed the first batch of chips, after the last dip of pins in the
> first plates, print, wash, and the pins redipped again in the first plate
> from the beginning, and print, wash, then stop to change the plate. The
> company gave us the patch to solve this problem. So this gal file is a
> little different than the rest batches of chips, the locations of genes,
> MSP, and controls are different (5%). After hybridization, I used GenePix
> Pro 6.1 for spotfinding. After reading the data into limma, I want to use
> MSP and control spots for normalization. I don't know how to label
> different gal files using readSpotTypes() in all chips.
> 
> Thanks,
> 
> Tiandao
> 
> I am kind of new to R and limma. The following is my setting.
> 
> > sessionInfo()
> R version 2.5.1 (2007-06-27)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
> [7] "base"
> 
> other attached packages:
>  statmod    limma
>  "1.3.0" "2.10.5"
> 
> Codes for analysis
> 
> library(limma)
> 
> A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532")
> B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635
> Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635
> Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 %
> Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532
> Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% >
> B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means
> (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)",
> "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F
> Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of
> Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532
> Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total
> Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags",
> "Normalize", "Autoflag")
> 
> # read 6 test files
> targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files
> RG <-
> read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A,other.columns=B)
> spottypes <- readSpotTypes("spottypes3.txt") # short spot types
> RG$genes$Status <- controlStatus(spottypes,RG)
> 
> targets
> SlideNumber     FileName        Cy3     Cy5     Name
> 1       13582917        N0      N1      N0N121
> 2       13582918        N0      N1      N0N122
> 3       13590446        N0      N1      N0N123
> 4       13590420        N1      H1      N1H121
> 5       13590521        N1      H1      N1H122
> 6       13591193        N1      H1      N1H123
> 
> spottypes3
> SpotType        ID      Color
> gene    *       black
> Calibration     Calib*  blue
> Ratio   Ratio*  red
> Negative        Neg*|Util*      brown
> MSP     MSP     orange
> Alexa   Alexa*  yellow
> blank   NotDefined      green