[BioC] Unable to Generate QC Report for mogene10stv1
Rick Frausto
ricardo.frausto at sydney.edu.au
Fri Jan 7 20:14:43 CET 2011
Hi James,
Below is the information that you requested - traceback() and sessioninfo().
Doesn't seem like much to me, but perhaps you can help. As you answer to a
lot of e-mails, thought I'd remind you that this is in regards to the "some
row.names duplicated" error.
Hope your holidays were good!
-Rick
[R.app GUI 1.35 (5632) x86_64-apple-darwin9.8.0]
[Workspace restored from /Users/rickfrausto/.RData]
[History restored from /Users/rickfrausto/.Rapp.history]
> library(affy)
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
> mydata <- ReadAffy()
> eset <- rma(mydata)
Background correcting
Normalizing
Calculating Expression
> write.exprs(eset, file="mydata.txt")
> mypm <- pm(mydata)
> mymm <- mm(mydata)
> myaffyids <- probeNames(mydata)
> result <- data.frame(myaffyids, mypm, mymm)
> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
Loading required package: lattice
Warning message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated:
4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,5
4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,1
03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,1
47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,1
73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,2
10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,2
52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,2
96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,3
38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,3
82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,4
07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,4
49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,4
95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
truncated]
Error in plot(qc(object)) :
error in evaluating the argument 'x' in selecting a method for function
'plot'
> traceback()
2: plot(qc(object))
1: QCReport(mydata, file = "ExampleQC.pdf")
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] affyQCReport_1.28.1 lattice_0.19-13 mogene10stv1cdf_2.7.0
[4] affy_1.28.0 Biobase_2.10.0
loaded via a namespace (and not attached):
[1] affyio_1.18.0 affyPLM_1.26.0 annotate_1.28.0
[4] AnnotationDbi_1.12.0 Biostrings_2.18.2 DBI_0.2-5
[7] gcrma_2.22.0 genefilter_1.32.0 grid_2.12.0
[10] IRanges_1.8.7 preprocessCore_1.12.0 RColorBrewer_1.0-2
[13] RSQLite_0.9-4 simpleaffy_2.26.1 splines_2.12.0
[16] survival_2.36-2 tools_2.12.0 xtable_1.5-6
>
On 20/12/10 6:33 AM, "James W. MacDonald" <jmacdon at med.umich.edu> wrote:
> Hi Rick,
>
> On 12/17/2010 9:24 PM, Rick Frausto wrote:
>> Hey Jim,
>>
>> Ok, I will give that a go. The only problem is an ExpressionSet contains all
>> of the necessary information for further analysis (e.g. phenodata,
>> featuredata and annotation, etc - including, treatment type, cell type, time
>> points, replicates). I am still learning how to include all of these for a
>> complete ExpressionSet. As a starting point I've loaded a txt file
>> containing some of this information (gene abbrev, ontology, probeset ID)
>> which I created using Affymetrix's Expression Console software, without
>> replicate, time point and cell type info. Doing this I've gotten as far as
>> creating a minimal ExpressionSet, which I guess the functions you mention
>> below do just that but with the information contained in the CEL file only.
>>
>> In any case, since as you say, the functions in the online manual create a
>> proper ExpressionSet why would I get the issue of duplication?
>
> Oh yeah, the original question ;-D. Try running QCreport() again, and
> when it errors out run traceback() and send the output. Also include the
> output of sessionInfo().
>
> Jim
>
>
>>
>> In regards to the 64-bit discussion. It may have very well made enough of a
>> difference as it did not come up with the memory error the last time I tried
>> it. Going to upgrade to 8GB RAM anyways, can't hurt.
>>
>> Cheers,
>> Rick
>>
>>
>> On 17/12/10 7:20 AM, "James W. MacDonald"<jmacdon at med.umich.edu> wrote:
>>
>>> Hi Rick,
>>>
>>> On 12/16/2010 4:13 PM, Rick Frausto wrote:
>>>> Hi Jim,
>>>>
>>>> How do I run an RMA analysis without a proper ExpresionSet? Honest answer,
>>>> I
>>>> don't know, I just put in a command line from a manual I found online and
>>>> it
>>>> spit out some result- see #3 Affy packages in following link (
>>>> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#biocon_intro).
>>>
>>> You are mistaken. All of the functions mentioned there result in a
>>> proper ExpressionSet. And if you just do
>>>
>>> abatch<- ReadAffy()
>>> eset<- rma(abatch)
>>>
>>> Then you will 100% surely get an ExpressionSet.
>>>
>>>>
>>>> Perhaps you don't need an ExpressionSet until after the preprocessing, at
>>>> least that is what I get from the "An Introduction to Bioconductor's
>>>> ExpressionSet Class" written by Seth Falcon, Martin Morgan and Robert
>>>> Gentleman. Everything seemed to be going smoothly until I tried to get a QC
>>>> Report.
>>>>
>>>> Now, the answer for why I would want to do such a thing is easy. Simply
>>>> that
>>>> I don't know any better :) Just started working with R a few days ago, but
>>>> I'm learning.
>>>>
>>>>
>>>> Apparently Snow Leopard running on 32bit can only utilize about 3.2GB of
>>>> RAM, whereas 64bit can make use of all 4GB. I'll switch to the 64 bit OS
>>>> and
>>>> see if it makes a difference.
>>>
>>> Well, it won't be much different. The reason a 32-bit OS can only use
>>> about 3.2 Gb of RAM is that the OS needs some to run. The 64-bit OS also
>>> needs to use some RAM, so you won't get all 4 Gb there either. The issue
>>> is how much RAM can be allocated to a single process, and on a 64-bit OS
>>> that gets bumped up significantly.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Thanks for your insight!
>>>>
>>>> Cheers,
>>>> Rick
>>>>
>>>>
>>>>
>>>>
>>>> On 16/12/10 11:31 AM, "James W. MacDonald"<jmacdon at med.umich.edu> wrote:
>>>>
>>>>> Hi Rick,
>>>>>
>>>>> On 12/16/2010 12:57 PM, Rick Frausto wrote:
>>>>>> Thanks Jim! How much memory would I need, I currently have 4GB, but have
>>>>>> quite a few other programs running in the background...I'll see if
>>>>>> closing
>>>>>> them helps. Perhaps setting up an "ExpressionSet" would solve the
>>>>>> problem.
>>>>>> I
>>>>>> just started reading up on how to set one of these up yesterday. Will do
>>>>>> this and see if the duplicates will go away.
>>>>>>
>>>>>> The "mydata" originates from CEL files and then I run the RMA analysis on
>>>>>> it, but I didn't actually set up a proper ExpressionSet. I'm guessing
>>>>>> that
>>>>>> doing this might reduce the QCReport PDF file size quite considerably
>>>>>> since
>>>>>> I won't have any duplication and will make further analysis easier.
>>>>>
>>>>> How do you run an RMA analysis without setting up a proper
>>>>> ExpressionSet? The default behavior is to create one. In addition, why
>>>>> would you want to do such a thing? The ExpressionSet class is
>>>>> specifically designed to contain these sorts of data.
>>>>>
>>>>>
>>>>>>
>>>>>> I'm running Snow Leopard OSX which can be set up as 64bit. Would running
>>>>>> as
>>>>>> 64bit still necessitate more RAM?
>>>>>
>>>>> Probably. The difference isn't efficiency, but the ability to address
>>>>> more RAM. A 32-bit OS can still address all the available memory that
>>>>> you will have with just 4 Gb RAM, so you need to bump that up if you
>>>>> want to do all the chips together. As for how much, I don't know. Since
>>>>> RAM isn't that expensive these days, you might look at maxing your box
>>>>> out.
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks again,
>>>>>> Rick
>>>>>>
>>>>>>
>>>>>> On 15/12/10 7:45 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Rick,
>>>>>>>
>>>>>>> On 12/14/2010 3:55 PM, Rick Frausto wrote:
>>>>>>>> Dear All,
>>>>>>>>
>>>>>>>> I have recently entered the world of R. Through some trial and error
>>>>>>>> I'm
>>>>>>>> becoming more familiar with R and the relevant Bioconductor Affy
>>>>>>>> packages.
>>>>>>>> I¹m a molecular and cell biologist with rudimentary statistical
>>>>>>>> knowledge
>>>>>>>> and even less knowledge with respect to R.
>>>>>>>>
>>>>>>>> When I enter the following:
>>>>>>>>
>>>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>>>>
>>>>>>>> I get some errors in return.
>>>>>>>>
>>>>>>>> Loading required package: lattice
>>>>>>>> Error: cannot allocate vector of size 437.4 Mb
>>>>>>>
>>>>>>> This indicates that you need more RAM, as you are running out of memory.
>>>>>>>
>>>>>>>> In addition: Warning message:
>>>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>>> some row.names duplicated:
>>>>>>>>
>>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,
>>>> >>
>>>>>>
>>>> 5
>>>>>>>>
>>>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,
>>>> >>
>>>>>>
>>>> 1
>>>>>>>>
>>>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,
>>>> >>
>>>>>>
>>>> 1
>>>>>>>>
>>>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,
>>>> >>
>>>>>>
>>>> 1
>>>>>>>>
>>>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,
>>>> >>
>>>>>>
>>>> 2
>>>>>>>>
>>>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,
>>>> >>
>>>>>>
>>>> 2
>>>>>>>>
>>>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,
>>>> >>
>>>>>>
>>>> 2
>>>>>>>>
>>>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,
>>>> >>
>>>>>>
>>>> 3
>>>>>>>>
>>>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,
>>>> >>
>>>>>>
>>>> 3
>>>>>>>>
>>>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,
>>>> >>
>>>>>>
>>>> 4
>>>>>>>>
>>>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,
>>>> >>
>>>>>>
>>>> 4
>>>>>>>>
>>>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,
>>>> >>
>>>>>>
>>>> 4
>>>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>>>> truncated]
>>>>>>>
>>>>>>> What exactly is 'mydata', and how did you generate it? The above error
>>>>>>> indicates that you have duplicate row names, which IIRC isn't possible
>>>>>>> to do with an expressionSet.
>>>>>>>
>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>> code=12)
>>>>>>>> *** error: can't allocate region
>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>> code=12)
>>>>>>>> *** error: can't allocate region
>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>
>>>>>>> More lack of memory errors.
>>>>>>>
>>>>>>>
>>>>>>>> Error in help(dt[i], package = pkg[i], htmlhelp = TRUE) :
>>>>>>>> unused argument(s) (htmlhelp = TRUE)
>>>>>>>> In addition: Warning messages:
>>>>>>>> 1: In data(package = .packages(all.available = TRUE)) :
>>>>>>>> datasets have been moved from package 'base' to package
>>>>>>>> 'datasets'
>>>>>>>> 2: In data(package = .packages(all.available = TRUE)) :
>>>>>>>> datasets have been moved from package 'stats' to package
>>>>>>>> 'datasets'
>>>>>>>> starting httpd help server ... done
>>>>>>>>
>>>>>>>> Would someone be able to diagnose the problem and suggest a solution?
>>>>>>>
>>>>>>> First, get more RAM. Second, you will be better off using a 64-bit OS.
>>>>>>> Depending on your hardware, you might be able to just install a 64-bit
>>>>>>> version of R.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Jim
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> If it is useful, I am using the following R software: R for Mac OS X
>>>>>>>> GUI
>>>>>>>> 1.35-dev Leopard build 32-bit. If there is any other info that would be
>>>>>>>> useful please let me know.
>>>>>>>>
>>>>>>>> I had a read of the AffyQCReport Package pdf and I have added the
>>>>>>>> following
>>>>>>>> line: QCReport(ReadAffy(widget=TRUE)). Then I tried
>>>>>>>> library(affyQCReport);
>>>>>>>> QCReport(mydata, file="ExampleQC.pdf") again. It now seems to be doing
>>>>>>>> something, in other words it doesn¹t go to the error, yet, but it¹s
>>>>>>>> been
>>>>>>>> processing for about 10 minutes. I am analyzing 35 chips.
>>>>>>>>
>>>>>>>> Perhaps it would work if I tried to generate each QCReport page
>>>>>>>> separately
>>>>>>>> rather than as a whole.
>>>>>>>>
>>>>>>>> Cordially,
>>>>>>>> Rick
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>
>>
--
Rick Frausto
PhD Candidate
The University of Sydney
School of Molecular Bioscience G08
Camperdown, NSW 2006 AUSTRALIA
ricardo.frausto at sydney.edu.au
Phone: 61 2 9036 5354
Lab of Iain L. Campbell
More information about the Bioconductor
mailing list