[BioC] different gal files using limma
Tiandao Li
Tiandao.Li at usm.edu
Tue Sep 11 23:26:52 CEST 2007
Dear Dr. Symth,
Thanks for your help. I read in the gpr files using 2 gal files
separately, then found the spot types separately, normalization
separately, and remove all control spots separately, and only keep gene
type for further analysis. Both MA1 and MA2 used the same gene ID s,
however, MA2$genes$ID have 8 more genes than MA1. I used your code to
match MA1 to MA2
m <- match(MA2$genes$ID, MA1$genes$ID)
MA <- cbind(MA1[m,], MA2)
I compared MA2 to MA2 part of MA, the numbers are identical, however,
there are some "NA" in MA$genes$ID instead of gene IDs from MA2$genes$ID.
Because MA1 and MA2 aren't the same length and IDs. Could I still use it?
There are 4 duplicate spots per gene on the array.
I put 2 target files together to create a new target file, and use it to
build design matrix for linear model. Is it OK?
Sincerely,
Tiandao
On Tue, 11 Sep 2007, Gordon Smyth wrote:
Dear Tiandao,
Dealing with multiple gal files is very tricky, but possible. In limma, you need
to read in the GPR files for each GAL file separately, identify control spots
separately, and normalize separately. So, if you have two GAL files, you will
end up with two normalized MAList objects MA1 and MA2.
You will then need to align MA1 and MA2 by gene ID. There is a merge command,
but very often the situation is too complex for this command to handle. Usually
you will need to remove the control spots from MA1 and MA2 separately, to get
down to a list of common genes, then sort MA1 to match the gene order of MA2,
then cbind them together.
If MA1 and MA2 are of the same length, with the same gene IDs, then something
like this wil do the merge:
m <- match(MA2$genes$ID, MA1$genes$ID)
MA <- cbind(MA1[m,], MA2)
There is any alternative method, which is to use the printorder() function to
map spots back to the original 384-well plate positions, then align the arrays
by 384-well plate. This method requires that the plates were used in the same
order throughout the printing, except for control plates.
You need to be very careful!
Good luck.
Gordon
> Date: Sun, 9 Sep 2007 14:26:47 -0500 (CDT)
> From: Tiandao Li <Tiandao.Li at usm.edu>
> Subject: [BioC] different gal files using limma
> To: Bioconductor_help <bioconductor at stat.math.ethz.ch>
> Message-ID: <Pine.LNX.4.64.0709091401440.32134 at orca.st.usm.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> Hello,
>
> I am analyzing cDNA microarray data using limma. I generated the GAL file
> using the program coming with chipwriter, everything looks great. However,
> when I printed the first batch of chips, after the last dip of pins in the
> first plates, print, wash, and the pins redipped again in the first plate
> from the beginning, and print, wash, then stop to change the plate. The
> company gave us the patch to solve this problem. So this gal file is a
> little different than the rest batches of chips, the locations of genes,
> MSP, and controls are different (5%). After hybridization, I used GenePix
> Pro 6.1 for spotfinding. After reading the data into limma, I want to use
> MSP and control spots for normalization. I don't know how to label
> different gal files using readSpotTypes() in all chips.
>
> Thanks,
>
> Tiandao
>
> I am kind of new to R and limma. The following is my setting.
>
> > sessionInfo()
> R version 2.5.1 (2007-06-27)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
> [7] "base"
>
> other attached packages:
> statmod limma
> "1.3.0" "2.10.5"
>
> Codes for analysis
>
> library(limma)
>
> A <- list(R="F635 Median",G="F532 Median",Rb="B635",Gb="B532")
> B <- list("Block", "Column", "Row", "Name", "ID", "X", "Y", "Dia.", "F635
> Median", "F635 Mean", "F635 SD", "F635 CV", "B635", "B635 Median", "B635
> Mean", "B635 SD", "B635 CV", "% > B635+1SD", "% > B635+2SD", "F635 %
> Sat.", "F532 Median", "F532 Mean", "F532 SD", "F532 CV", "B532", "B532
> Median", "B532 Mean", "B532 SD", "B532 CV", "% > B532+1SD", "% >
> B532+2SD", "F532 % Sat.", "Ratio of Medians (635/532)", "Ratio of Means
> (635/532)", "Median of Ratios (635/532)", "Mean of Ratios (635/532)",
> "Ratios SD (635/532)", "Rgn Ratio (635/532)", "Rgn R2 (635/532)", "F
> Pixels", "B Pixels", "Circularity", "Sum of Medians (635/532)", "Sum of
> Means (635/532)", "Log Ratio (635/532)", "F635 Median - B635", "F532
> Median - B532", "F635 Mean - B635", "F532 Mean - B532", "F635 Total
> Intensity", "F532 Total Intensity", "SNR 635", "SNR 532", "Flags",
> "Normalize", "Autoflag")
>
> # read 6 test files
> targets<-readTargets(file="targets.txt", row.name="Name") # 6 test files
> RG <-
> read.maimages(targets$FileName,source="genepix",ext="gpr",columns=A,other.columns=B)
> spottypes <- readSpotTypes("spottypes3.txt") # short spot types
> RG$genes$Status <- controlStatus(spottypes,RG)
>
> targets
> SlideNumber FileName Cy3 Cy5 Name
> 1 13582917 N0 N1 N0N121
> 2 13582918 N0 N1 N0N122
> 3 13590446 N0 N1 N0N123
> 4 13590420 N1 H1 N1H121
> 5 13590521 N1 H1 N1H122
> 6 13591193 N1 H1 N1H123
>
> spottypes3
> SpotType ID Color
> gene * black
> Calibration Calib* blue
> Ratio Ratio* red
> Negative Neg*|Util* brown
> MSP MSP orange
> Alexa Alexa* yellow
> blank NotDefined green
More information about the Bioconductor
mailing list