[BioC] different gal files using limma
Tiandao.Li at usm.edu
Wed Sep 12 05:04:50 CEST 2007
Dear Dr. Smyth,
MA2 had the full set of IDs (2716 genes), while MA1 only 8 IDs less than
the full set of IDs, 2708 genes. I want to match MA1 to MA2, however,
there are 8 "NA" in new MA$genes$ID instead of the IDs from MA2. The rest
of them are the same. I will check if there is any different between MA1
and MA1 part of new MA.
I am new to R and limma, I import the entire gpr files and export them to
see if I do anything wrong. I used some items as quality controls.
Everything is fine except "Log Ratio (635/532)" sometimes give me
"character" instead of "numeric".
Since I had 2 target files to read in gpr files. Now I put 2 target files
together to create a new target file, and use it to build design matrix
for linear model. Is it OK?
On Wed, 12 Sep 2007, Gordon Smyth wrote:
It doesn't necessarily make sense to try to merge MAList if they aren't the same
length and don't have the same IDs. I suggest you get down to a subset of probes
for this is true, then try the merge command again. This assumes that the ID
column of RG$genes has unambiguous identifiers for each probe. (I can't give you
a lot of detail, because trying to troubleshoot this over the email is very
BTW, I notice that you're reading the entire GPR files into your RGList objects.
This will make huge objects. Do you need to do that? Why not just
RG <- read.maimages(targets,source="genepix.median",ext="gpr")
At 07:26 AM 12/09/2007, Tiandao Li wrote:
> Dear Dr. Symth,
> Thanks for your help. I read in the gpr files using 2 gal files
> separately, then found the spot types separately, normalization
> separately, and remove all control spots separately, and only keep gene
> type for further analysis. Both MA1 and MA2 used the same gene ID s,
> however, MA2$genes$ID have 8 more genes than MA1. I used your code to
> match MA1 to MA2
> m <- match(MA2$genes$ID, MA1$genes$ID)
> MA <- cbind(MA1[m,], MA2)
> I compared MA2 to MA2 part of MA, the numbers are identical, however,
> there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID.
> Because MA1 and MA2 aren't the same length and IDs. Could I still use it?
> There are 4 duplicate spots per gene on the array.
> I put 2 target files together to create a new target file, and use it to
> build design matrix for linear model. Is it OK?
> On Tue, 11 Sep 2007, Gordon Smyth wrote:
> Dear Tiandao,
> Dealing with multiple gal files is very tricky, but possible. In limma, you
> to read in the GPR files for each GAL file separately, identify control spots
> separately, and normalize separately. So, if you have two GAL files, you will
> end up with two normalized MAList objects MA1 and MA2.
> You will then need to align MA1 and MA2 by gene ID. There is a merge command,
> but very often the situation is too complex for this command to handle.
> you will need to remove the control spots from MA1 and MA2 separately, to get
> down to a list of common genes, then sort MA1 to match the gene order of MA2,
> then cbind them together.
> If MA1 and MA2 are of the same length, with the same gene IDs, then something
> like this wil do the merge:
> m <- match(MA2$genes$ID, MA1$genes$ID)
> MA <- cbind(MA1[m,], MA2)
> There is any alternative method, which is to use the printorder() function to
> map spots back to the original 384-well plate positions, then align the arrays
> by 384-well plate. This method requires that the plates were used in the same
> order throughout the printing, except for control plates.
> You need to be very careful!
> Good luck.
> > Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT)
> > From: Tiandao Li <Tiandao.Li at usm.edu>
> > Subject: [BioC] different gal files using limma
> > To: Bioconductor_help <bioconductor at stat.math.ethz.ch>
> > Message-ID: <Pine.LNX.4.64.0709091401440.32134 at orca.st.usm.edu>
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> > Hello,
> > I am analyzing cDNA microarray data using limma. I generated the GAL file
> > using the program coming with chipwriter, everything looks great. However,
> > when I printed the first batch of chips, after the last dip of pins in the
> > first plates, print, wash, and the pins redipped again in the first plate
> > from the beginning, and print, wash, then stop to change the plate. The
> > company gave us the patch to solve this problem. So this gal file is a
> > little different than the rest batches of chips, the locations of genes,
> > MSP, and controls are different (5%). After hybridization, I used GenePix
> > Pro 6.1 for spotfinding. After reading the data into limma, I want to use
> > MSP and control spots for normalization. I don't know how to label
> > different gal files using readSpotTypes() in all chips.
> > Thanks,
> > Tiandao
> > I am kind of new to R and limma. The following is my setting.
> > > sessionInfo()
> > R version 2.5.1 (2007-06-27)
> > i386-pc-mingw32
> > locale:
> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > States.1252;LC_MONETARY=English_United
> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> > attached base packages:
> >  "stats" "graphics" "grDevices" "utils" "datasets" "methods"
> >  "base"
> > other attached packages:
> > statmod limma
> > "1.3.0" "2.10.5"
> > Codes for analysis
> > library(limma)
> > A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532")
> > B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635
> > Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635
> > Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 %
> > Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532
> > Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% >
> > B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means
> > (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)",
> > "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F
> > Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of
> > Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532
> > Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total
> > Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags",
> > "Normalize", "Autoflag")
> > # read 6 test files
> > targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files
> > RG <-
> > spottypes <- readSpotTypes("spottypes3.txt") # short spot types
> > RG$genes$Status <- controlStatus(spottypes,RG)
> > targets
> > SlideNumber FileName Cy3 Cy5 Name
> > 1 13582917 N0 N1 N0N121
> > 2 13582918 N0 N1 N0N122
> > 3 13590446 N0 N1 N0N123
> > 4 13590420 N1 H1 N1H121
> > 5 13590521 N1 H1 N1H122
> > 6 13591193 N1 H1 N1H123
> > spottypes3
> > SpotType ID Color
> > gene * black
> > Calibration Calib* blue
> > Ratio Ratio* red
> > Negative Neg*|Util* brown
> > MSP MSP orange
> > Alexa Alexa* yellow
> > blank NotDefined green
More information about the Bioconductor