[BioC] Unable to Generate QC Report for mogene10stv1

Sat Dec 18 03:24:17 CET 2010

Hey Jim,

Ok, I will give that a go. The only problem is an ExpressionSet contains all
of the necessary information for further analysis (e.g. phenodata,
featuredata and annotation, etc - including, treatment type, cell type, time
points, replicates). I am still learning how to include all of these for a
complete ExpressionSet. As a starting point I've loaded a txt file
containing some of this information (gene abbrev, ontology, probeset ID)
which I created using Affymetrix's Expression Console software, without
replicate, time point and cell type info. Doing this I've gotten as far as
creating a minimal ExpressionSet, which I guess the functions you mention
below do just that but with the information contained in the CEL file only.

In any case, since as you say, the functions in the online manual create a
proper ExpressionSet why would I get the issue of duplication?

In regards to the 64-bit discussion. It may have very well made enough of a
difference as it did not come up with the memory error the last time I tried
it. Going to upgrade to 8GB RAM anyways, can't hurt.

Cheers,
Rick

On 17/12/10 7:20 AM, "James W. MacDonald" <jmacdon at med.umich.edu> wrote:

> Hi Rick,
> 
> On 12/16/2010 4:13 PM, Rick Frausto wrote:
>> Hi Jim,
>> 
>> How do I run an RMA analysis without a proper ExpresionSet? Honest answer, I
>> don't know, I just put in a command line from a manual I found online and it
>> spit out some result- see #3 Affy packages in following link (
>> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#biocon_intro).
> 
> You are mistaken. All of the functions mentioned there result in a
> proper ExpressionSet. And if you just do
> 
> abatch <- ReadAffy()
> eset <- rma(abatch)
> 
> Then you will 100% surely get an ExpressionSet.
> 
>> 
>> Perhaps you don't need an ExpressionSet until after the preprocessing, at
>> least that is what I get from the "An Introduction to Bioconductor's
>> ExpressionSet Class" written by Seth Falcon, Martin Morgan and Robert
>> Gentleman. Everything seemed to be going smoothly until I tried to get a QC
>> Report.
>> 
>> Now, the answer for why I would want to do such a thing is easy. Simply that
>> I don't know any better :) Just started working with R a few days ago, but
>> I'm learning.
>> 
>> 
>> Apparently Snow Leopard running on 32bit can only utilize about 3.2GB of
>> RAM, whereas 64bit can make use of all 4GB. I'll switch to the 64 bit OS and
>> see if it makes a difference.
> 
> Well, it won't be much different. The reason a 32-bit OS can only use
> about 3.2 Gb of RAM is that the OS needs some to run. The 64-bit OS also
> needs to use some RAM, so you won't get all 4 Gb there either. The issue
> is how much RAM can be allocated to a single process, and on a 64-bit OS
> that gets bumped up significantly.
> 
> Best,
> 
> Jim
> 
> 
> 
>> 
>> Thanks for your insight!
>> 
>> Cheers,
>> Rick
>> 
>> 
>> 
>> 
>> On 16/12/10 11:31 AM, "James W. MacDonald"<jmacdon at med.umich.edu>  wrote:
>> 
>>> Hi Rick,
>>> 
>>> On 12/16/2010 12:57 PM, Rick Frausto wrote:
>>>> Thanks Jim! How much memory would I need, I currently have 4GB, but have
>>>> quite a few other programs running in the background...I'll see if closing
>>>> them helps. Perhaps setting up an "ExpressionSet" would solve the problem.
>>>> I
>>>> just started reading up on how to set one of these up yesterday. Will do
>>>> this and see if the duplicates will go away.
>>>> 
>>>> The "mydata" originates from CEL files and then I run the RMA analysis on
>>>> it, but I didn't actually set up a proper ExpressionSet. I'm guessing that
>>>> doing this might reduce the QCReport PDF file size quite considerably since
>>>> I won't have any duplication and will make further analysis easier.
>>> 
>>> How do you run an RMA analysis without setting up a proper
>>> ExpressionSet? The default behavior is to create one. In addition, why
>>> would you want to do such a thing? The ExpressionSet class is
>>> specifically designed to contain these sorts of data.
>>> 
>>> 
>>>> 
>>>> I'm running Snow Leopard OSX which can be set up as 64bit. Would running as
>>>> 64bit still necessitate more RAM?
>>> 
>>> Probably. The difference isn't efficiency, but the ability to address
>>> more RAM. A 32-bit OS can still address all the available memory that
>>> you will have with just 4 Gb RAM, so you need to bump that up if you
>>> want to do all the chips together. As for how much, I don't know. Since
>>> RAM isn't that expensive these days, you might look at maxing your box out.
>>> 
>>> Best,
>>> 
>>> Jim
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Thanks again,
>>>> Rick
>>>> 
>>>> 
>>>> On 15/12/10 7:45 AM, "James W. MacDonald"<jmacdon at med.umich.edu>   wrote:
>>>> 
>>>>> Hi Rick,
>>>>> 
>>>>> On 12/14/2010 3:55 PM, Rick Frausto wrote:
>>>>>> Dear All,
>>>>>> 
>>>>>> I have recently entered the world of R. Through some trial and error I'm
>>>>>> becoming more familiar with R and the relevant Bioconductor Affy
>>>>>> packages.
>>>>>> I¹m a molecular and cell biologist with rudimentary statistical knowledge
>>>>>> and even less knowledge with respect to R.
>>>>>> 
>>>>>> When I enter the following:
>>>>>> 
>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>> 
>>>>>> I get some errors in return.
>>>>>> 
>>>>>> Loading required package: lattice
>>>>>> Error: cannot allocate vector of size 437.4 Mb
>>>>> 
>>>>> This indicates that you need more RAM, as you are running out of memory.
>>>>> 
>>>>>> In addition: Warning message:
>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>      some row.names duplicated:
>>>>>> 
>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,>>
>> >>
>> 5
>>>>>> 
>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,>>
>> >>
>> 1
>>>>>> 
>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,>>
>> >>
>> 1
>>>>>> 
>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,>>
>> >>
>> 1
>>>>>> 
>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,>>
>> >>
>> 2
>>>>>> 
>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,>>
>> >>
>> 2
>>>>>> 
>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,>>
>> >>
>> 2
>>>>>> 
>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,>>
>> >>
>> 3
>>>>>> 
>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,>>
>> >>
>> 3
>>>>>> 
>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,>>
>> >>
>> 4
>>>>>> 
>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,>>
>> >>
>> 4
>>>>>> 
>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,>>
>> >>
>> 4
>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>> truncated]
>>>>> 
>>>>> What exactly is 'mydata', and how did you generate it? The above error
>>>>> indicates that you have duplicate row names, which IIRC isn't possible
>>>>> to do with an expressionSet.
>>>>> 
>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>> code=12)
>>>>>> *** error: can't allocate region
>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>> code=12)
>>>>>> *** error: can't allocate region
>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>> 
>>>>> More lack of memory errors.
>>>>> 
>>>>> 
>>>>>> Error in help(dt[i], package = pkg[i], htmlhelp = TRUE) :
>>>>>>      unused argument(s) (htmlhelp = TRUE)
>>>>>> In addition: Warning messages:
>>>>>> 1: In data(package = .packages(all.available = TRUE)) :
>>>>>>      datasets have been moved from package 'base' to package 'datasets'
>>>>>> 2: In data(package = .packages(all.available = TRUE)) :
>>>>>>      datasets have been moved from package 'stats' to package 'datasets'
>>>>>> starting httpd help server ... done
>>>>>> 
>>>>>> Would someone be able to diagnose the problem and suggest a solution?
>>>>> 
>>>>> First, get more RAM. Second, you will be better off using a 64-bit OS.
>>>>> Depending on your hardware, you might be able to just install a 64-bit
>>>>> version of R.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> If it is useful, I am using the following R software: R for Mac OS X GUI
>>>>>> 1.35-dev Leopard build 32-bit. If there is any other info that would be
>>>>>> useful please let me know.
>>>>>> 
>>>>>> I had a read of the AffyQCReport Package pdf and I have added the
>>>>>> following
>>>>>> line: QCReport(ReadAffy(widget=TRUE)). Then I tried
>>>>>> library(affyQCReport);
>>>>>> QCReport(mydata, file="ExampleQC.pdf") again. It now seems to be doing
>>>>>> something, in other words it doesn¹t go to the error, yet, but it¹s been
>>>>>> processing for about 10 minutes. I am analyzing 35 chips.
>>>>>> 
>>>>>> Perhaps it would work if I tried to generate each QCReport page
>>>>>> separately
>>>>>> rather than as a whole.
>>>>>> 
>>>>>> Cordially,
>>>>>> Rick
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>> 

-- 
Rick Frausto
PhD Candidate
The University of Sydney
School of Molecular Bioscience G08
Camperdown, NSW 2006 AUSTRALIA
ricardo.frausto at sydney.edu.au
Phone: 61 2 9036 5354
Lab of Iain L. Campbell