[BioC] different gal files using limma

Wed Sep 12 05:04:50 CEST 2007

Dear Dr. Smyth,

MA2 had the full set of IDs (2716 genes), while MA1 only 8 IDs less than 
the full set of IDs, 2708 genes. I want to match MA1 to MA2, however, 
there are 8 "NA" in new MA$genes$ID instead of the IDs from MA2. The rest 
of them are the same. I will check if there is any different between MA1 
and MA1 part of new MA.

I am new to R and limma, I import the entire gpr files and export them to 
see if I do anything wrong. I used some items as quality controls. 
Everything is fine except "Log Ratio (635/532)" sometimes give me 
"character" instead of "numeric".

Since I had 2 target files to read in gpr files. Now I put 2 target files 
together to create a new target file, and use it to build design matrix 
for linear model. Is it OK?

Sincerely,

Tiandao

On Wed, 12 Sep 2007, Gordon Smyth wrote:

Dear Tiandao,

It doesn't necessarily make sense to try to merge MAList if they aren't the same
length and don't have the same IDs. I suggest you get down to a subset of probes
for this is true, then try the merge command again. This assumes that the ID
column of RG$genes has unambiguous identifiers for each probe. (I can't give you
a lot of detail, because trying to troubleshoot this over the email is very
hard.)

BTW, I notice that you're reading the entire GPR files into your RGList objects.
This will make huge objects. Do you need to do that? Why not just

  RG <- read.maimages(targets,source="genepix.median",ext="gpr")

Best wishes
Gordon

At 07:26 AM 12/09/2007, Tiandao Li wrote:
> Dear Dr. Symth,
> 
> Thanks for your help. I read in the gpr files using 2 gal files
> separately, then found the spot types separately, normalization
> separately, and remove all control spots separately, and only keep gene
> type for further analysis. Both MA1 and MA2 used the same gene ID s,
> however, MA2$genes$ID have 8 more genes than MA1. I used your code to
> match MA1 to MA2
> 
> m <- match(MA2$genes$ID, MA1$genes$ID)
> MA <- cbind(MA1[m,], MA2)
> 
> I compared MA2 to MA2 part of MA, the numbers are identical, however,
> there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID.
> Because MA1 and MA2 aren't the same length and IDs. Could I still use it?
> There are 4 duplicate spots per gene on the array.
> 
> I put 2 target files together to create a new target file, and use it to
> build design matrix for linear model. Is it OK?
> 
> Sincerely,
> 
> Tiandao
> 
> On Tue, 11 Sep 2007, Gordon Smyth wrote:
> 
> Dear Tiandao,
> 
> Dealing with multiple gal files is very tricky, but possible. In limma, you
> need
> to read in the GPR files for each GAL file separately, identify control spots
> separately, and normalize separately. So, if you have two GAL files, you will
> end up with two normalized MAList objects MA1 and MA2.
> 
> You will then need to align MA1 and MA2 by gene ID. There is a merge command,
> but very often the situation is too complex for this command to handle.
> Usually
> you will need to remove the control spots from MA1 and MA2 separately, to get
> down to a list of common genes, then sort MA1 to match the gene order of MA2,
> then cbind them together.
> 
> If MA1 and MA2 are of the same length, with the same gene IDs, then something
> like this wil do the merge:
> 
>    m <- match(MA2$genes$ID, MA1$genes$ID)
>    MA <- cbind(MA1[m,], MA2)
> 
> There is any alternative method, which is to use the printorder() function to
> map spots back to the original 384-well plate positions, then align the arrays
> by 384-well plate. This method requires that the plates were used in the same
> order throughout the printing, except for control plates.
> 
> You need to be very careful!
> Good luck.
> Gordon
> 
> > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT)
> > From: Tiandao Li <Tiandao.Li at usm.edu>
> > Subject: [BioC] different gal files using limma
> > To: Bioconductor_help <bioconductor at stat.math.ethz.ch>
> > Message-ID: <Pine.LNX.4.64.0709091401440.32134 at orca.st.usm.edu>
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> >
> > Hello,
> >
> > I am analyzing cDNA microarray data using limma. I generated the GAL file
> > using the program coming with chipwriter, everything looks great. However,
> > when I printed the first batch of chips, after the last dip of pins in the
> > first plates, print, wash, and the pins redipped again in the first plate
> > from the beginning, and print, wash, then stop to change the plate. The
> > company gave us the patch to solve this problem. So this gal file is a
> > little different than the rest batches of chips, the locations of genes,
> > MSP, and controls are different (5%). After hybridization, I used GenePix
> > Pro 6.1 for spotfinding. After reading the data into limma, I want to use
> > MSP and control spots for normalization. I don't know how to label
> > different gal files using readSpotTypes() in all chips.
> >
> > Thanks,
> >
> > Tiandao
> >
> > I am kind of new to R and limma. The following is my setting.
> >
> > > sessionInfo()
> > R version 2.5.1 (2007-06-27)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > States.1252;LC_MONETARY=English_United
> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
> > [7] "base"
> >
> > other attached packages:
> >  statmod    limma
> >  "1.3.0" "2.10.5"
> >
> > Codes for analysis
> >
> > library(limma)
> >
> > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532")
> > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635
> > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635
> > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 %
> > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532
> > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% >
> > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means
> > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)",
> > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F
> > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of
> > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532
> > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total
> > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags",
> > "Normalize", "Autoflag")
> >
> > # read 6 test files
> > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files
> > RG <-
> >
> read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A,other.columns=B)
> > spottypes <- readSpotTypes("spottypes3.txt") # short spot types
> > RG$genes$Status <- controlStatus(spottypes,RG)
> >
> > targets
> > SlideNumber     FileName        Cy3     Cy5     Name
> > 1       13582917        N0      N1      N0N121
> > 2       13582918        N0      N1      N0N122
> > 3       13590446        N0      N1      N0N123
> > 4       13590420        N1      H1      N1H121
> > 5       13590521        N1      H1      N1H122
> > 6       13591193        N1      H1      N1H123
> >
> > spottypes3
> > SpotType        ID      Color
> > gene    *       black
> > Calibration     Calib*  blue
> > Ratio   Ratio*  red
> > Negative        Neg*|Util*      brown
> > MSP     MSP     orange
> > Alexa   Alexa*  yellow
> > blank   NotDefined      green