[BioC] OpenArray Input Difficulties - Feature Number Errors, Sample Classification
Scott Robinson [guest]
guest at bioconductor.org
Mon Nov 26 14:59:15 CET 2012
I am working with an OpenArray miRNA dataset with 72 samples.
I am having a little trouble with the file input. I had been told by the lab scientist who gathered the data that there were 750 genes measured by this array, so I tried this:
> fileList <- c("Runs 1-4.csv","Runs 5-8.csv")
> memStickPath <- "E:/Work/miRNomics/miRNA data/raw"
>
> sampleCounts <- c(36,36)
>
> raw <- readCtData(files = fileList, path = memStickPath, format = "OpenArray", n.features = 750, n.data = sampleCounts)
Warning messages:
1: In matrix(sample[, column.info[["Ct"]]], ncol = n.data[i]) :
data length [26994] is not a sub-multiple or multiple of the number of rows [750]
2: In matrix(sample[, column.info[["flag"]]], ncol = n.data[i]) :
data length [26994] is not a sub-multiple or multiple of the number of rows [750]
The first odd thing here is that my file has 29448 rows, not the 26994 quoted in the error (which turns out to be 3 samples shorter). Because of the error relating to the 750 multiple I looked at the file and discovered that there appear to be 818 rows per sample so...
> fileList <- c("Runs 1-4.csv","Runs 5-8.csv")
> memStickPath <- "E:/Work/miRNomics/miRNA data/raw"
>
> sampleCounts <- c(36,36)
>
> raw <- readCtData(files = fileList, path = memStickPath, format = "OpenArray", n.features = 818, n.data = sampleCounts)
Error in `[<-.data.frame`(`*tmp*`, undeter, value = "Undetermined") :
only logical matrix subscripts are allowed in replacement
In addition: Warning message:
In matrix(sample[, column.info[["Ct"]]], ncol = n.data[i]) :
data length [26994] is not a sub-multiple or multiple of the number of rows [750]
I have since tried joining the two files (of 36 samples each) into one file (of 72):
> thisPath <- "C:/Users/sr216a/Documents/PreEc_miRNA/raw"
>
> sampleCount <- 72
>
> raw <- readCtData(files = "allRuns.csv", path = thisPath, format = "OpenArray", n.features = 818, n.data = sampleCount)
Error in `[<-.data.frame`(`*tmp*`, undeter, value = "Undetermined") :
only logical matrix subscripts are allowed in replacement
In addition: Warning message:
In matrix(sample[, column.info[["Ct"]]], ncol = n.data[i]) :
data length [56442] is not a sub-multiple or multiple of the number of rows [784]
I returned to trying "n.features = 750" and the command appears to work! I am quite confused as to what is going on here and would very much appreciate any help regarding:
-which 750 of the 818 are being picked up, why and how
-how samples are distinguished from one another (does the method read the "SampleInfo.SampleID" column or should the files be pre-ordered by sample)
I am also a little confused about how one associates samples with classifications e.g. case and control. It seems that most of the methods utilising this info use "groups = files$Treatment", but I don't seem to be able to find a description of the format of this file. Is "phenoData" meant to contain similar info? Is the "phenoData" important for standard usage of the package or is this an additional helpful data structure?
Any help would be very much appreciated,
Scott
-- output of sessionInfo():
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] HTqPCR_1.12.0 limma_3.14.1 RColorBrewer_1.0-5 Biobase_2.18.0
[5] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] affy_1.36.0 affyio_1.26.0 BiocInstaller_1.8.3
[4] gdata_2.12.0 gplots_2.11.0 gtools_2.7.0
[7] preprocessCore_1.20.0 stats4_2.15.2 tools_2.15.2
[10] zlibbioc_1.4.0
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list