[BioC] Communicating gene-probe relationships to sam
Charles C. Berry
cberry at tajo.ucsd.edu
Sun May 7 00:22:00 CEST 2006
Charles,
You used method='cat.stat', which is (only) suitable for SNPs, I think.
This seems like an odd choice for detecting SFPs. You would have to do
more work to get this to work properly if you actually were counting SNPs
- i.e. you would have to **call** the SNPs. (And reasonably enough, sam()
doesn't like small sample numbers either.)
I suggest you use a linear model (method='d.stat') like the one in our
Genome Research paper available here:
http://naturalsystems.uchicago.edu/naturalvariation/sfp/
BTW, sam() will not object to very small sample numbers with
'method="d.stat"'
HTH,
Chuck Berry
On Fri, 5 May 2006, Charles Crane wrote:
> Dear newsgroup:
> I am trying to use sam to identify single-feature polymorphisms between
> two barley cultivars. I am running Bioconductor 1.7, Biobase 1.8.0, affy
> 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9. I can successfully extract
> probe-level data from my CEL files, background correct the pm values with
> rma, quantile normalize, and isolate the pm intensities in a data frame, but
> both sam and sam.snp consistently and instantly fail with "Error in
> FUN(data, cl, ...) : There should be at least 25 samples." This happens
> even though there are 22840 genes and 11 probes per gene in the data. What
> am I overlooking? Does sam need some sort of a mapping from the 11 probes
> to their parent gene? A session history follows.
> Thank your very much for your time. Also, do you plan to include
> directions for using sam for SNP analysis, going all the way from ReadAffy
> through sam for a case history, in a future vignette? It would save the
> uninitiated a lot of time and frustration.
>
> Charles Crane
> USDA-ARS, MWA, Crop Production and Pest Control Research Unit
> And Department of Botany and Plant Pathology, Purdue University
>
>> library(affy)
>> library(siggenes)
>> sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL",
> "000114_12hr_kindered_H20_inoc_3bx1.CEL",
> + "000122_5hr_peruvian_H20_inoc_14bx1.CEL",
> "000123_12hr_peruvian_H20_inoc_15bx1.CEL",
> "000141_Non-host-resistant-barley_16bx2.CEL",
> + "000157_24hr_Kindered_H2O_Inoc_1.CEL",
> "000158_0hr_Kindered_H2O_Inoc_4.CEL",
> "000159_0hr_Kindered_S_passerini_inoc_5.CEL",
> + "000161_0hr_Peruvian_H2O_inoc_13.CEL",
> "000162_0hr_Peruvian_S_passerini_17.CEL",
> "000189_0hr_403-rep2_H2O_inoc_21b.CEL",
> + "000190_5hr_403-rep2_H2O_inoc_22b.CEL",
> "000191_12hr_403-rep2_H2O_inoc_23b.CEL",
> "000192_24hr_403-rep2_H2O_inoc_24b.CEL",
> + "000193_0hr_403-rep2_inoc-s_pass_25b.CEL",
> "000201_0hr_405-rep2_H2O_inoc_33b.CEL",
> "000202_5hr_405-rep2_H2O_inoc_34b.CEL",
> + "000203_12hr_405-rep2_H2O_inoc_35b.CEL",
> "000204_24hr_405-rep2_H2O_inoc_36b.CEL",
> "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"),
> + sampleNames = c("000113_5hr_kindered_H20_inoc_2x2",
> "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1",
> + "000123_12hr_peruvian_H20_inoc_15bx1",
> "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1",
> + "000158_0hr_Kindered_H2O_Inoc_4",
> "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13",
> + "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b",
> "000190_5hr_403-rep2_H2O_inoc_22b",
> + "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b",
> "000193_0hr_403-rep2_inoc-s_pass_25b",
> + "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b",
> "000203_12hr_405-rep2_H2O_inoc_35b",
> + "000204_24hr_405-rep2_H2O_inoc_36b",
> "000205_0hr_405-rep2_inoc-s_pass_37b"),
> + phenoData = "Hordeumphenodata.txt")
>> signames <- geneNames(sidata)
>> sirma <- bg.correct(sidata, method = "rma")
>> sinorm <- normalize(sirma)
>> siprobeintensities <- as.data.frame(probes(sinorm, "pm"))
>> write.table(siprobeintensities, file = "siprobeintensities.txt")
>> sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
>> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names =
> dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279)
> Error in FUN(data, cl, ...) : There should be at least 25 samples.
>> sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character')
> Read 251437 items
>> sigenenames[1:22]
> [1] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at"
> [3] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at"
> [5] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at"
> [7] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at"
> [9] "1200459_Reg_88-1740_at" "1200459_Reg_88-1740_at"
> [11] "1200459_Reg_88-1740_at" "1289374_Reg_826-1545_at"
> [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
>> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names =
> sigenenames, B = 1000, ran = 11279)
> Error in FUN(data, cl, ...) : There should be at least 25 samples.
>
>
>
> [ Part 3.6: "Included Message" ]
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
More information about the Bioconductor
mailing list