[BioC] Unable to Generate QC Report for mogene10stv1

James W. MacDonald jmacdon at med.umich.edu
Tue Jan 11 15:58:01 CET 2011


Hi Rick,

On 1/10/2011 4:57 PM, Rick Frausto wrote:
> Hi Jim,
>
> You're right...
>
>> any(duplicated(unlist(indexProbes(mydata, "both"))))
> [1] TRUE
>>
>
> Figured it would be something simple, almost always is. Guess since the MM
> values are only really necessary for calculating a "real" PM value I should
> generally still be ok with using R Bioconductor packages for downstream
> analysis of these chips?? For example, using eset<-rma() to normalize my
> data should still be ok.

Yep. RMA only uses PM values, so this will be fine. You only get into 
trouble when trying to use mas5 based methods.

>
> By the way, the documentation on the AffyQCReport function regarding
> signalDist() states that "The first is a boxplot plot of the all pm
> intensities and the second plot consists of kernel density estimates of
> these intensities." From this it would seem to a novice like me that it only
> uses PM values, clearly I'm not correct. I guess these are PM values
> adjusted for the MM signal.

Nope, they aren't adjusted for MM, they just include the MM values as 
well. Here is a little primer on how to see what is going on.

If you load the affyQCReport package and then type signalDist at the R 
prompt, you will get this:

 > signalDist
function (object)
{
     par(mfrow = c(2, 1))
     ArrayIndex = as.character(1:length(sampleNames(object)))
     boxplot(object, names = ArrayIndex, ylab = "Log2(Intensity)",
         xlab = "Array Index")
     hist(x = object, lt = 1:length(ArrayIndex), col = 1:length(ArrayIndex),
         which = "both")
     temppar <- par()
     legend(((temppar$xaxp[2] - temppar$xaxp[1])/temppar$xaxp[3]) *
         (temppar$xaxp[3] - 1) + temppar$xaxp[1], temppar$yaxp[2],
         as.character(ArrayIndex), lt = 1:length(ArrayIndex),
         col = 1:length(ArrayIndex), cex = 0.5)
}
<environment: namespace:affyQCReport>

So you can see that we are calling boxplot() as well as hist() on the 
'object', which is an AffyBatch. Let's see what boxplot() and hist() do.

 > boxplot
standardGeneric for "boxplot" defined from package "graphics"

function (x, ...)
standardGeneric("boxplot")
<environment: 0x184ea378>
Methods may be defined for arguments: x
Use  showMethods("boxplot")  for currently available ones.

So this is an S4 method, and the methods are slightly harder to get to, 
but let's follow the prescription on the last line.

 > showMethods(boxplot, class = "AffyBatch", includeDefs = TRUE)
Function: boxplot (package graphics)
x="AffyBatch"
function (x, ...)
{
     .local <- function (x, which = "both", range = 0, main, ...)
     {
         tmp <- description(x)
         if (missing(main) && (is(tmp, "MIAME")))
             main <- tmp at title
         tmp <- unlist(indexProbes(x, which))
         tmp <- tmp[seq(1, length(tmp), len = 5000)]
         boxplot(data.frame(log2(intensity(x)[tmp, ])), main = main,
             range = range, ...)
     }
     .local(x, ...)
}

Note two things here. I added in class = "AffyBatch", because there may 
be other boxplot methods for other objects, and we really don't care 
about them. Additionally, I included includeDefs = TRUE, which will 
cause the function to be output.

The .local function has a default of which = 'both', and you see that 
argument is used for the call to indexProbes (also note that there is a 
'...' argument to .local, that could be used to pass in a which = "pm" 
in signalDist() to override the default, but it is not, so the help page 
is incorrect). If you look at ?indexProbes, you will see this in the 
methods section:

indexProbes 'signature(object = "AffyBatch", which =
           "character")': returns a list with locations of the probes in
           each probe set. The affyID corresponding to the probe set to
           retrieve can be specified in an optional parameter
           'genenames'. By default, all the affyIDs are retrieved. The
           names of the elements in the list returned are the affyIDs.
           'which' can be "pm", "mm", or "both". If "both" then perfect
           match locations are given followed by mismatch locations.

The warning you get comes from here:

tmp <- unlist(indexProbes(x, which))
tmp <- tmp[seq(1, length(tmp), len = 5000)]
boxplot(data.frame(log2(intensity(x)[tmp, ])), main = main,
             range = range, ...)

Which is basically getting a subset of 5000 probes to create the 
boxplot. Since half of your indices from indexProbes() will be NA, a 
bunch of the tmp variable will be NAs as well. We can re-create the 
warning you get below with a little example:

 > x <- matrix(rnorm(100), ncol = 10)
 > row.names(x) <- letters[1:10]
 > z <- data.frame(x[c(1,2,3,NA,4,5,NA),])
Warning message:
In data.row.names(row.names, rowsi, i) :
   some row.names duplicated: 7 --> row.names NOT used

Best,

Jim


>
> Thanks for figuring this out for me. Let me know if these and other related
> questions would be better served as standalone e-mails.
>
> Cheers,
> Rick
>
>
>
> On 10/01/11 7:04 AM, "James W. MacDonald"<jmacdon at med.umich.edu>  wrote:
>
>> Hi Rick,
>>
>> After all that, the reason is really simple. You are trying to use
>> affyQCReport on a PM-only chip, which isn't going to work out so well. I
>> don't have any mogene data around to play with (and don't have the time
>> to go searching), so I will have to make some educated guesses.
>>
>> Internally in signalDist() you are calling boxplot() and hist() on your
>> AffyBatch. And the default for both functions is to use both PM and MM
>> probes. I'm betting that
>>
>> any(duplicated(unlist(indexProbes(mydata, "both"))))
>>
>> returns TRUE, indicating that indexProbes doesn't work correctly on a
>> PM-only chip, which is fair enough, as it was never designed to do so.
>>
>> And plot(qc(mydata)) will never work, as it relies on computing a
>> Wilcoxon signed-rank between the PM and MM probes, and since you don't
>> have MM probes, well you get the picture...
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 1/7/2011 6:56 PM, Rick Frausto wrote:
>>> Hi Jim,
>>>
>>> Ok, so after doing a bit of reading and re-reading I was eventually able to
>>> generate each page in a quartz window that the "QCReport" function should
>>> also generate. I found which ones give me the errors. So, there should be 6
>>> pages in total. Page 2 gives me the duplication error and page 3 gives me
>>> the error in evaluating the argument x. The other pages are ok and are
>>> generated as expected.
>>>
>>> In brief, page 2 is suppose to be generated with the "signalDist(mydata)"
>>> command. Page 3 is suppose to generated with the "plot(qc(mydata))" command.
>>>
>>> So, I guess there must be particular requirements for these commands that
>>> I'm missing.I've included the session below along with traceback() and
>>> sessionInfo().
>>>
>>>
>>> R version 2.12.0 (2010-10-15)
>>> Copyright (C) 2010 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>> You are welcome to redistribute it under certain conditions.
>>> Type 'license()' or 'licence()' for distribution details.
>>>
>>>     Natural language support but running in an English locale
>>>
>>> R is a collaborative project with many contributors.
>>> Type 'contributors()' for more information and
>>> 'citation()' on how to cite R or R packages in publications.
>>>
>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>> 'help.start()' for an HTML browser interface to help.
>>> Type 'q()' to quit R.
>>>
>>> [R.app GUI 1.35 (5632) x86_64-apple-darwin9.8.0]
>>>
>>> [Workspace restored from /Users/rickfrausto/.RData]
>>> [History restored from /Users/rickfrausto/.Rapp.history]
>>>
>>>> library(simpleaffy)
>>> Loading required package: affy
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>>     Vignettes contain introductory material. To view, type
>>>     'openVignette()'. To cite Bioconductor, see
>>>     'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>
>>> Loading required package: genefilter
>>> Loading required package: gcrma
>>>
>>> Attaching package: 'simpleaffy'
>>>
>>> The following object(s) are masked _by_ '.GlobalEnv':
>>>
>>>       getBioC
>>>
>>>> library(affy)
>>>> mydata<- ReadAffy()
>>>> eset<- rma(mydata)
>>> Background correcting
>>> Normalizing
>>> Calculating Expression
>>>> library(affycoretools); affystart(plot=T, express="rma")
>>> Loading required package: GO.db
>>> Loading required package: AnnotationDbi
>>> Loading required package: DBI
>>> Loading required package: KEGG.db
>>> Background correcting
>>> Normalizing
>>> Calculating Expression
>>> Please give the x-coordinate for a legend.30
>>> Please give the y-coordinate for a legend.80
>>> ExpressionSet (storageMode: lockedEnvironment)
>>> assayData: 34760 features, 35 samples
>>>     element names: exprs
>>> protocolData
>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>     varLabels: ScanDate
>>>     varMetadata: labelDescription
>>> phenoData
>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>     varLabels: sample
>>>     varMetadata: labelDescription
>>> featureData: none
>>> experimentData: use 'experimentData(object)'
>>> Annotation: mogene10stv1
>>>> write.exprs(eset, file="mydata.txt")
>>>> x<- data.frame(exprs(eset), exprs(eset_PMA), assayDataElement(eset_PMA,
>>> "se.exprs")); x<- x[,sort(names(x))]; write.table(x, file="mydata_PMA.xls",
>>> quote=F, col.names = NA, sep="\t")
>>> Error in exprs(eset_PMA) :
>>>     error in evaluating the argument 'object' in selecting a method for
>>> function 'exprs'
>>>> mypm<- pm(mydata)
>>>> mymm<- mm(mydata)
>>>> myaffyids<- probeNames(mydata)
>>>> result<- data.frame(myaffyids, mypm, mymm)
>>>> eset; pData(eset)
>>> ExpressionSet (storageMode: lockedEnvironment)
>>> assayData: 34760 features, 35 samples
>>>     element names: exprs
>>> protocolData
>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>     varLabels: ScanDate
>>>     varMetadata: labelDescription
>>> phenoData
>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>     varLabels: sample
>>>     varMetadata: labelDescription
>>> featureData: none
>>> experimentData: use 'experimentData(object)'
>>> Annotation: mogene10stv1
>>>                          sample
>>> A_WT1_NT_2hr.CEL            1
>>> B_WT1_NT_2hr.CEL            2
>>> C_WT1_NT_12hr.CEL           3
>>> D_WT1_NT_12hr.CEL           4
>>> E_WT1_HIL6_2hr.CEL          5
>>> F_WT1_HIL6_2hr.CEL          6
>>> G_WT1_HIL6_12hr.CEL         7
>>> H_WT1_HIL6_12hr.CEL         8
>>> I_FF_NT_2hr.CEL             9
>>> J_FF_NT_2hr.CEL            10
>>> K_FF_NT_12hr.CEL           11
>>> L_FF_NT_12hr.CEL           12
>>> M_FF_HIL6_2hr.CEL          13
>>> N_FF_HIL6_2hr.CEL          14
>>> O_FF_HIL6_12hr.CEL         15
>>> P_FF_HIL6_12hr.CEL         16
>>> Q_WT2_NT_2hr.CEL           17
>>> R_WT2_NT_2hr.CEL           18
>>> S_WT2_NT_12hr.CEL          19
>>> T_WT2_NT_12hr.CEL          20
>>> U_WT2_HIL6_2hr.CEL         21
>>> V_WT2_HIL6_2hr.CEL         22
>>> W_WT2_HIL6_12hr.CEL        23
>>> X_WT2_HIL6_12hr.CEL        24
>>> Y_DD_NT_2hr.CEL            25
>>> Z_DD_NT_2hr.CEL            26
>>> ZA_DD_NT_12hr.CEL          27
>>> ZB_DD_NT_12hr.CEL          28
>>> ZC_DD_HIL6_2hr.CEL         29
>>> ZD_DD_HIL6_2hr.CEL         30
>>> ZE_DD_HIL6_12hr.CEL        31
>>> ZF_DD_HIL6_12hr.CEL        32
>>> ZG_ST1KO_NT_2hr.CEL        33
>>> ZH_ST1KO_HIL6_2hr.CEL      34
>>> ZI_ST1KO_HIL6_12hr.CEL     35
>>>> data.frame(eset)
>>>                          X10338001 X10338003 X10338004 X10338017 X10338025
>>> A_WT1_NT_2hr.CEL        11.71717 10.183620  9.440631  12.79412  8.823529
>>> B_WT1_NT_2hr.CEL        11.78778 10.027760  9.489226  12.98544  8.843002
>>>                          X10338026 X10338029 X10338035 X10338036 X10338037
>>> A_WT1_NT_2hr.CEL        13.22585  9.405038  8.853564  9.379031  3.661987
>>> B_WT1_NT_2hr.CEL        13.29043  9.575309  8.772872  9.513050  3.514885
>>>                          X10338041 X10338042 X10338044 X10338047 X10338056
>>> A_WT1_NT_2hr.CEL        10.94638 10.116516  11.88296  8.872839  3.133222
>>> B_WT1_NT_2hr.CEL        11.23276 10.134084  12.03381  7.568584  3.088548
>>>                          X10338059 X10338060 X10338063 X10338064 X10338065
>>>
>>> JIM, I TRUNCATED THIS LIST, BUT THOUGHT IT MIGHT BE USEFUL IN DIAGNOSING THE
>>> PROBLEMS I'M HAVING. SESSION IS CONTINUED BELOW.
>>>
>>>> library(affyQCReport)
>>> Loading required package: lattice
>>>> titlePage(mydata)
>>> [1] TRUE
>>>> signalDist(mydata)
>>> Warning message:
>>> In data.row.names(row.names, rowsi, i) :
>>>     some row.names duplicated:
>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,5
>>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,1
>>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,1
>>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,1
>>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,2
>>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,2
>>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,2
>>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,3
>>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,3
>>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,4
>>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,4
>>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,4
>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>> truncated]
>>>> plot(qc(mydata))
>>> Error in plot(qc(mydata)) :
>>>     error in evaluating the argument 'x' in selecting a method for function
>>> 'plot'
>>>> borderQC1(mydata)
>>> [1] TRUE
>>>> borderQC2(mydata)
>>> [1] TRUE
>>>> correlationPlot(mydata)
>>> [1] TRUE
>>>> titlePage(mydata)
>>> [1] TRUE
>>>> titlePage(mydata)
>>> Error in polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05)) :
>>>     plot.new has not been called yet
>>>> correlationPlot(mydata)
>>> [1] TRUE
>>>> titlePage(mydata)
>>> Error in polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05)) :
>>>     plot.new has not been called yet
>>> In addition: Warning message:
>>> Display list redraw incomplete
>>>> borderQC1(mydata)
>>> [1] TRUE
>>>> titlePage(mydata)
>>> [1] TRUE
>>>> titlePage(mydata)
>>> Error in polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05)) :
>>>     plot.new has not been called yet
>>>> traceback()
>>> 2: polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05))
>>> 1: titlePage(mydata)
>>>> sessionInfo()
>>> R version 2.12.0 (2010-10-15)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>>    [1] affyQCReport_1.28.1   lattice_0.19-13       affycoretools_1.22.0
>>>    [4] KEGG.db_2.4.5         GO.db_2.4.5           RSQLite_0.9-4
>>>    [7] DBI_0.2-5             AnnotationDbi_1.12.0  mogene10stv1cdf_2.7.0
>>> [10] simpleaffy_2.26.1     gcrma_2.22.0          genefilter_1.32.0
>>> [13] affy_1.28.0           Biobase_2.10.0
>>>
>>> loaded via a namespace (and not attached):
>>>    [1] affyio_1.18.0         affyPLM_1.26.0        annaffy_1.22.0
>>>    [4] annotate_1.28.0       biomaRt_2.6.0         Biostrings_2.18.2
>>>    [7] Category_2.16.0       GOstats_2.16.0        graph_1.28.0
>>> [10] grid_2.12.0           GSEABase_1.12.2       IRanges_1.8.7
>>> [13] limma_3.6.9           preprocessCore_1.12.0 RBGL_1.26.0
>>> [16] RColorBrewer_1.0-2    RCurl_1.4-3           splines_2.12.0
>>> [19] survival_2.36-2       tools_2.12.0          XML_3.2-0
>>> [22] xtable_1.5-6
>>>>
>>>
>>> On 7/01/11 12:47 PM, "James W. MacDonald"<jmacdon at med.umich.edu>   wrote:
>>>
>>>> Hi Rick,
>>>>
>>>> What happens if you load the simpleaffy package first?
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>> On 1/7/2011 2:14 PM, Rick Frausto wrote:
>>>>> Hi James,
>>>>>
>>>>> Below is the information that you requested - traceback() and
>>>>> sessioninfo().
>>>>> Doesn't seem like much to me, but perhaps you can help. As you answer to a
>>>>> lot of e-mails, thought I'd remind you that this is in regards to the "some
>>>>> row.names duplicated" error.
>>>>>
>>>>> Hope your holidays were good!
>>>>>
>>>>> -Rick
>>>>>
>>>>> [R.app GUI 1.35 (5632) x86_64-apple-darwin9.8.0]
>>>>>
>>>>> [Workspace restored from /Users/rickfrausto/.RData]
>>>>> [History restored from /Users/rickfrausto/.Rapp.history]
>>>>>
>>>>>> library(affy)
>>>>> Loading required package: Biobase
>>>>>
>>>>> Welcome to Bioconductor
>>>>>
>>>>>      Vignettes contain introductory material. To view, type
>>>>>      'openVignette()'. To cite Bioconductor, see
>>>>>      'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>>>
>>>>>> mydata<- ReadAffy()
>>>>>> eset<- rma(mydata)
>>>>> Background correcting
>>>>> Normalizing
>>>>> Calculating Expression
>>>>>> write.exprs(eset, file="mydata.txt")
>>>>>> mypm<- pm(mydata)
>>>>>> mymm<- mm(mydata)
>>>>>> myaffyids<- probeNames(mydata)
>>>>>> result<- data.frame(myaffyids, mypm, mymm)
>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>> Loading required package: lattice
>>>>> Warning message:
>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>      some row.names duplicated:
>>>>>
> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,>>>>
> 5
>>>>>
> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,>>>>
> 1
>>>>>
> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,>>>>
> 1
>>>>>
> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,>>>>
> 1
>>>>>
> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,>>>>
> 2
>>>>>
> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,>>>>
> 2
>>>>>
> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,>>>>
> 2
>>>>>
> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,>>>>
> 3
>>>>>
> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,>>>>
> 3
>>>>>
> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,>>>>
> 4
>>>>>
> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,>>>>
> 4
>>>>>
> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,>>>>
> 4
>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>> truncated]
>>>>> Error in plot(qc(object)) :
>>>>>      error in evaluating the argument 'x' in selecting a method for function
>>>>> 'plot'
>>>>>> traceback()
>>>>> 2: plot(qc(object))
>>>>> 1: QCReport(mydata, file = "ExampleQC.pdf")
>>>>>> sessionInfo()
>>>>> R version 2.12.0 (2010-10-15)
>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>> [1] affyQCReport_1.28.1   latptice_0.19-13       mogene10stv1cdf_2.7.0
>>>>> [4] affy_1.28.0           Biobase_2.10.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>     [1] affyio_1.18.0         affyPLM_1.26.0        annotate_1.28.0
>>>>>     [4] AnnotationDbi_1.12.0  Biostrings_2.18.2     DBI_0.2-5
>>>>>     [7] gcrma_2.22.0          genefilter_1.32.0     grid_2.12.0
>>>>> [10] IRanges_1.8.7         preprocessCore_1.12.0 RColorBrewer_1.0-2
>>>>> [13] RSQLite_0.9-4         simpleaffy_2.26.1     splines_2.12.0
>>>>> [16] survival_2.36-2       tools_2.12.0          xtable_1.5-6
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 20/12/10 6:33 AM, "James W. MacDonald"<jmacdon at med.umich.edu>    wrote:
>>>>>
>>>>>> Hi Rick,
>>>>>>
>>>>>> On 12/17/2010 9:24 PM, Rick Frausto wrote:
>>>>>>> Hey Jim,
>>>>>>>
>>>>>>> Ok, I will give that a go. The only problem is an ExpressionSet contains
>>>>>>> all
>>>>>>> of the necessary information for further analysis (e.g. phenodata,
>>>>>>> featuredata and annotation, etc - including, treatment type, cell type,
>>>>>>> time
>>>>>>> points, replicates). I am still learning how to include all of these for
>>>>>>> a
>>>>>>> complete ExpressionSet. As a starting point I've loaded a txt file
>>>>>>> containing some of this information (gene abbrev, ontology, probeset ID)
>>>>>>> which I created using Affymetrix's Expression Console software, without
>>>>>>> replicate, time point and cell type info. Doing this I've gotten as far
>>>>>>> as
>>>>>>> creating a minimal ExpressionSet, which I guess the functions you mention
>>>>>>> below do just that but with the information contained in the CEL file
>>>>>>> only.
>>>>>>>
>>>>>>> In any case, since as you say, the functions in the online manual create
>>>>>>> a
>>>>>>> proper ExpressionSet why would I get the issue of duplication?
>>>>>>
>>>>>> Oh yeah, the original question ;-D. Try running QCreport() again, and
>>>>>> when it errors out run traceback() and send the output. Also include the
>>>>>> output of sessionInfo().
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> In regards to the 64-bit discussion. It may have very well made enough of
>>>>>>> a
>>>>>>> difference as it did not come up with the memory error the last time I
>>>>>>> tried
>>>>>>> it. Going to upgrade to 8GB RAM anyways, can't hurt.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Rick
>>>>>>>
>>>>>>>
>>>>>>> On 17/12/10 7:20 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Rick,
>>>>>>>>
>>>>>>>> On 12/16/2010 4:13 PM, Rick Frausto wrote:
>>>>>>>>> Hi Jim,
>>>>>>>>>
>>>>>>>>> How do I run an RMA analysis without a proper ExpresionSet? Honest
>>>>>>>>> answer,
>>>>>>>>> I
>>>>>>>>> don't know, I just put in a command line from a manual I found online
>>>>>>>>> and
>>>>>>>>> it
>>>>>>>>> spit out some result- see #3 Affy packages in following link (
>>>>>>>>> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#biocon_intro
>>>>>>>>> ).
>>>>>>>>
>>>>>>>> You are mistaken. All of the functions mentioned there result in a
>>>>>>>> proper ExpressionSet. And if you just do
>>>>>>>>
>>>>>>>> abatch<- ReadAffy()
>>>>>>>> eset<- rma(abatch)
>>>>>>>>
>>>>>>>> Then you will 100% surely get an ExpressionSet.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Perhaps you don't need an ExpressionSet until after the preprocessing,
>>>>>>>>> at
>>>>>>>>> least that is what I get from the "An Introduction to Bioconductor's
>>>>>>>>> ExpressionSet Class" written by Seth Falcon, Martin Morgan and Robert
>>>>>>>>> Gentleman. Everything seemed to be going smoothly until I tried to get
>>>>>>>>> a
>>>>>>>>> QC
>>>>>>>>> Report.
>>>>>>>>>
>>>>>>>>> Now, the answer for why I would want to do such a thing is easy. Simply
>>>>>>>>> that
>>>>>>>>> I don't know any better :) Just started working with R a few days ago,
>>>>>>>>> but
>>>>>>>>> I'm learning.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Apparently Snow Leopard running on 32bit can only utilize about 3.2GB
>>>>>>>>> of
>>>>>>>>> RAM, whereas 64bit can make use of all 4GB. I'll switch to the 64 bit
>>>>>>>>> OS
>>>>>>>>> and
>>>>>>>>> see if it makes a difference.
>>>>>>>>
>>>>>>>> Well, it won't be much different. The reason a 32-bit OS can only use
>>>>>>>> about 3.2 Gb of RAM is that the OS needs some to run. The 64-bit OS also
>>>>>>>> needs to use some RAM, so you won't get all 4 Gb there either. The issue
>>>>>>>> is how much RAM can be allocated to a single process, and on a 64-bit OS
>>>>>>>> that gets bumped up significantly.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Jim
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for your insight!
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Rick
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 16/12/10 11:31 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Rick,
>>>>>>>>>>
>>>>>>>>>> On 12/16/2010 12:57 PM, Rick Frausto wrote:
>>>>>>>>>>> Thanks Jim! How much memory would I need, I currently have 4GB, but
>>>>>>>>>>> have
>>>>>>>>>>> quite a few other programs running in the background...I'll see if
>>>>>>>>>>> closing
>>>>>>>>>>> them helps. Perhaps setting up an "ExpressionSet" would solve the
>>>>>>>>>>> problem.
>>>>>>>>>>> I
>>>>>>>>>>> just started reading up on how to set one of these up yesterday. Will
>>>>>>>>>>> do
>>>>>>>>>>> this and see if the duplicates will go away.
>>>>>>>>>>>
>>>>>>>>>>> The "mydata" originates from CEL files and then I run the RMA
>>>>>>>>>>> analysis
>>>>>>>>>>> on
>>>>>>>>>>> it, but I didn't actually set up a proper ExpressionSet. I'm guessing
>>>>>>>>>>> that
>>>>>>>>>>> doing this might reduce the QCReport PDF file size quite considerably
>>>>>>>>>>> since
>>>>>>>>>>> I won't have any duplication and will make further analysis easier.
>>>>>>>>>>
>>>>>>>>>> How do you run an RMA analysis without setting up a proper
>>>>>>>>>> ExpressionSet? The default behavior is to create one. In addition, why
>>>>>>>>>> would you want to do such a thing? The ExpressionSet class is
>>>>>>>>>> specifically designed to contain these sorts of data.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm running Snow Leopard OSX which can be set up as 64bit. Would
>>>>>>>>>>> running
>>>>>>>>>>> as
>>>>>>>>>>> 64bit still necessitate more RAM?
>>>>>>>>>>
>>>>>>>>>> Probably. The difference isn't efficiency, but the ability to address
>>>>>>>>>> more RAM. A 32-bit OS can still address all the available memory that
>>>>>>>>>> you will have with just 4 Gb RAM, so you need to bump that up if you
>>>>>>>>>> want to do all the chips together. As for how much, I don't know.
>>>>>>>>>> Since
>>>>>>>>>> RAM isn't that expensive these days, you might look at maxing your box
>>>>>>>>>> out.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Jim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks again,
>>>>>>>>>>> Rick
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 15/12/10 7:45 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Rick,
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/14/2010 3:55 PM, Rick Frausto wrote:
>>>>>>>>>>>>> Dear All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have recently entered the world of R. Through some trial and
>>>>>>>>>>>>> error
>>>>>>>>>>>>> I'm
>>>>>>>>>>>>> becoming more familiar with R and the relevant Bioconductor Affy
>>>>>>>>>>>>> packages.
>>>>>>>>>>>>> I¹m a molecular and cell biologist with rudimentary statistical
>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>> and even less knowledge with respect to R.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I enter the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>>>>>>>>>
>>>>>>>>>>>>> I get some errors in return.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Loading required package: lattice
>>>>>>>>>>>>> Error: cannot allocate vector of size 437.4 Mb
>>>>>>>>>>>>
>>>>>>>>>>>> This indicates that you need more RAM, as you are running out of
>>>>>>>>>>>> memory.
>>>>>>>>>>>>
>>>>>>>>>>>>> In addition: Warning message:
>>>>>>>>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>>>>>>>>          some row.names duplicated:
>>>>>>>>>>>>>
>>>>>>>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52
>>>>>>>>> ,5
>>>>>>>>> 3,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 5
>>>>>>>>>>>>>
>>>>>>>>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,
>>>>>>>>> 10
>>>>>>>>> 2,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 1
>>>>>>>>>>>>>
>>>>>>>>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,
>>>>>>>>> 14
>>>>>>>>> 2,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 1
>>>>>>>>>>>>>
>>>>>>>>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,
>>>>>>>>> 17
>>>>>>>>> 1,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 1
>>>>>>>>>>>>>
>>>>>>>>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,
>>>>>>>>> 20
>>>>>>>>> 7,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 2
>>>>>>>>>>>>>
>>>>>>>>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,
>>>>>>>>> 25
>>>>>>>>> 1,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 2
>>>>>>>>>>>>>
>>>>>>>>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,
>>>>>>>>> 29
>>>>>>>>> 2,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 2
>>>>>>>>>>>>>
>>>>>>>>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,
>>>>>>>>> 33
>>>>>>>>> 7,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 3
>>>>>>>>>>>>>
>>>>>>>>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,
>>>>>>>>> 37
>>>>>>>>> 8,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 3
>>>>>>>>>>>>>
>>>>>>>>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,
>>>>>>>>> 40
>>>>>>>>> 6,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 4
>>>>>>>>>>>>>
>>>>>>>>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,
>>>>>>>>> 44
>>>>>>>>> 7,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 4
>>>>>>>>>>>>>
>>>>>>>>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,
>>>>>>>>> 49
>>>>>>>>> 4,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> 4
>>>>>>>>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>>>>>>>>> truncated]
>>>>>>>>>>>>
>>>>>>>>>>>> What exactly is 'mydata', and how did you generate it? The above
>>>>>>>>>>>> error
>>>>>>>>>>>> indicates that you have duplicate row names, which IIRC isn't
>>>>>>>>>>>> possible
>>>>>>>>>>>> to do with an expressionSet.
>>>>>>>>>>>>
>>>>>>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>>>>>>> code=12)
>>>>>>>>>>>>> *** error: can't allocate region
>>>>>>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>>>>>>> code=12)
>>>>>>>>>>>>> *** error: can't allocate region
>>>>>>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>>>>>>
>>>>>>>>>>>> More lack of memory errors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Error in help(dt[i], package = pkg[i], htmlhelp = TRUE) :
>>>>>>>>>>>>>          unused argument(s) (htmlhelp = TRUE)
>>>>>>>>>>>>> In addition: Warning messages:
>>>>>>>>>>>>> 1: In data(package = .packages(all.available = TRUE)) :
>>>>>>>>>>>>>          datasets have been moved from package 'base' to package
>>>>>>>>>>>>> 'datasets'
>>>>>>>>>>>>> 2: In data(package = .packages(all.available = TRUE)) :
>>>>>>>>>>>>>          datasets have been moved from package 'stats' to package
>>>>>>>>>>>>> 'datasets'
>>>>>>>>>>>>> starting httpd help server ... done
>>>>>>>>>>>>>
>>>>>>>>>>>>> Would someone be able to diagnose the problem and suggest a
>>>>>>>>>>>>> solution?
>>>>>>>>>>>>
>>>>>>>>>>>> First, get more RAM. Second, you will be better off using a 64-bit
>>>>>>>>>>>> OS.
>>>>>>>>>>>> Depending on your hardware, you might be able to just install a
>>>>>>>>>>>> 64-bit
>>>>>>>>>>>> version of R.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Jim
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> If it is useful, I am using the following R software: R for Mac OS
>>>>>>>>>>>>> X
>>>>>>>>>>>>> GUI
>>>>>>>>>>>>> 1.35-dev Leopard build 32-bit. If there is any other info that
>>>>>>>>>>>>> would
>>>>>>>>>>>>> be
>>>>>>>>>>>>> useful please let me know.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I had a read of the AffyQCReport Package pdf and I have added the
>>>>>>>>>>>>> following
>>>>>>>>>>>>> line: QCReport(ReadAffy(widget=TRUE)). Then I tried
>>>>>>>>>>>>> library(affyQCReport);
>>>>>>>>>>>>> QCReport(mydata, file="ExampleQC.pdf") again. It now seems to be
>>>>>>>>>>>>> doing
>>>>>>>>>>>>> something, in other words it doesn¹t go to the error, yet, but it¹s
>>>>>>>>>>>>> been
>>>>>>>>>>>>> processing for about 10 minutes. I am analyzing 35 chips.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps it would work if I tried to generate each QCReport page
>>>>>>>>>>>>> separately
>>>>>>>>>>>>> rather than as a whole.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cordially,
>>>>>>>>>>>>> Rick
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>>>>> Search the archives:
>>>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list