[Bioc-devel] Package download when using functions from affy and oligo

Ben Bolstad bmb @ending from bmbol@t@d@com
Tue May 15 21:54:08 CEST 2018


One of the CEL files is truncated or otherwise corrupted. The 
appropriate place to really detect that is in the code that reads the 
CEL file data and appropriately warn or error to the user.

The bug below occurs because of a massive mismatch between the 
assumptions of the RMA background model (essentially the modal spike at 
0 intensity throws it completely off) and I think the only time I've see 
it occur in the past was also with corrupted data. Any which way a 
segfault is a less than desirable outcome.

Ben

On 2018-05-14 17:11, Martin Morgan wrote:
> One of your cell files is funky
> 
>> colSums(exprs(alldata) == 0)
> GSM907854.CEL.gz GSM907866.CEL.gz GSM907857.CEL.gz GSM907863.CEL.gz
>                0                0                0           686388
> GSM907856.CEL.gz GSM907862.CEL.gz GSM907855.CEL.gz GSM907861.CEL.gz
>                0                0                0                0
> 
> and it's tickling a bug in preprocessCore
> 
> $ R -d gdb -f testOligo.R
> ...
> 0x00007fffe190418a in max_density (z=0x7fffcc0008c0, rows=0, cols=1, 
> column=0)
>     at rma_background4.c:128
> 128	rma_background4.c: No such file or dire
> 
> (gdb) dir /home/mtmorgan/b/git/preprocessCore/src/
> Source directories searched: 
> /home/mtmorgan/b/git/preprocessCore/src:$cdir:$cwd
> (gdb) l
> 123
> 124	  max_y = find_max(dens_y,16384);
> 125
> 126	  i = 0;
> 127	  do {
> 128	    if (dens_y[i] == max_y)
> 129	      break;
> 130	    i++;
> 131
> 132	  } while(1);
> (gdb) p i
> $1 = 1821306
> (gdb) p max_y
> 
> 
> Maybe one of the preprocessCore pros will chime in...
> 
> Martin
> 
> On 05/14/2018 09:15 AM, Joris Meys wrote:
>> Dear all,
>> 
>> sorry for the delayed response, due to some unfortunate events I had 
>> to
>> prioritize my family the past week.
>> 
>> You find an RStudio project in a zipped folder on this link :
>> https://jorismeys.stackstorage.com/s/3ik0vMwsvueuT5a
>> 
>> It contains a script called testOligo.R that can be sourced and nukes 
>> my R
>> session in the second step of the rma() function. It also contains the
>> faulty .gz files. If you need more information, don't hesitate to 
>> contact
>> me.
>> 
>> Regarding improving general maintainability, I'm willing to help out 
>> on
>> that. Problem is that I'm rather behind with my own work, so I'm short 
>> on
>> time for the moment. I'll fork affy tomorrow (need to give class now) 
>> and
>> let's start from there then?
>> 
>> Cheers
>> Joris
>> 
>> 
>> 
>> On Sat, May 5, 2018 at 5:17 PM, Vincent Carey 
>> <stvjc at channing.harvard.edu>
>> wrote:
>> 
>>> How about a google drive?  This problem of autodownloading should be
>>> addressed directly.
>>> These facilities are still important but their maintenance is clearly 
>>> a
>>> lower priority as the
>>> technologies handled have diminished use in the field.  I think we 
>>> should
>>> be able to team up and remove autoinstallation elements of these 
>>> packages,
>>> and
>>> perhaps improve general maintainability -- Joris, can you pick
>>> one, make a github repo that we can collaborate on revising, and then
>>> we can start?  It will involve a deprecation process.
>>> 
>>> On Sat, May 5, 2018 at 10:54 AM, Joris Meys <jorismeys at gmail.com> 
>>> wrote:
>>> 
>>>> Thank you for the answer.
>>>> 
>>>> I was trying to create a reproducible example before I vented maybe 
>>>> a bit
>>>> too much in my previous mail.
>>>> 
>>>> I managed to get closer to the problem and it is related to data 
>>>> that was
>>>> corrupted at download. I can send you a reproducible example that 
>>>> bombs R,
>>>> but I will have to send the specific data files as well. How do I 
>>>> send
>>>> them
>>>> best?
>>>> 
>>>> Cheers
>>>> Joris
>>>> 
>>>> On Sat, 5 May 2018, 00:09 James W. MacDonald, <jmacdon at uw.edu> 
>>>> wrote:
>>>> 
>>>>> I think there are multiple complaints here, so I'll take them one 
>>>>> at a
>>>>> time.
>>>>> 
>>>>> On Fri, May 4, 2018 at 3:56 PM, Obenchain, Valerie <
>>>>> Valerie.Obenchain at roswellpark.org> wrote:
>>>>> 
>>>>>> Joris,
>>>>>> 
>>>>>> Sorry I don't have much to offer here. I've cc'd the authors of 
>>>>>> oligo
>>>> and
>>>>>> affy who may have some insight.
>>>>>> 
>>>>>> Valerie
>>>>>> 
>>>>>> 
>>>>>> On 05/02/2018 11:35 AM, Joris Meys wrote:
>>>>>> 
>>>>>> Dear,
>>>>>> 
>>>>>> I've noticed that using certain functions in affy and oligo (eg
>>>>>> oligo::read.celfiles and affy::bg.correct) start with downloading
>>>> another
>>>>>> package and end with either R crashing or a warning that -after
>>>>>> installation succeeded- the package is not available.
>>>>> 
>>>>> 
>>>>> This is true for oligo, and perhaps a bit annoying. If you don't 
>>>>> have
>>>> the
>>>>> package installed already, it gets the package, installs it, and 
>>>>> then
>>>> says
>>>>> it's not available. This is an easy enough fix.
>>>>> 
>>>>> 
>>>>> After which using
>>>>>> some functions of both packages still crash R.
>>>>>> 
>>>>> 
>>>>> I don't know what to do with that. What functions?
>>>>> 
>>>>> 
>>>>>> 
>>>>>> The warning I get when trying oligo::read.celfiles() on a single 
>>>>>> CEL
>>>> file
>>>>>> right after installing it about the pd.hugene.1.0.st.v1 package. 
>>>>>> The
>>>> even
>>>>>> more annoying thing is that on my machine it insists on building 
>>>>>> from
>>>>>> source, whereas on another Windows machine without Rtools, it
>>>> downloads a
>>>>>> binary.
>>>>>> 
>>>>> 
>>>>> That is an options setting that gets changed when you install 
>>>>> Rtools.
>>>> The
>>>>> 'pkgType' option gets set to 'both' because you can now install 
>>>>> both
>>>> kinds.
>>>>> And in install.packages it ends up getting switched from 'both' to
>>>>> 'source'. I haven't dug any further into that because I am not sure 
>>>>> I
>>>> see
>>>>> why it's a problem. In the end there isn't a difference between
>>>> installing
>>>>> a source or a binary pdInfoPackage, and trying to get it to 'do the
>>>> right
>>>>> thing' might have some unforeseen consequences that I would rather 
>>>>> not
>>>> have
>>>>> to worry about. This is really an 'if it ain't broke, don't fix it'
>>>>> scenario, IMO.
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Reason it frustrates the heck out of me, is that both affy and 
>>>>>> oligo
>>>>>> crashed the R session in different ways. During installation of a
>>>> package,
>>>>>> during use of a function, and at different points when comparing 
>>>>>> my
>>>>>> machine
>>>>>> with the one of our students. The culprit seems to be in one of 
>>>>>> the
>>>>>> underlying packages, but I wasn't even able to detect which 
>>>>>> package is
>>>> the
>>>>>> culprit, let alone which function crashes everything.
>>>>>> 
>>>>> 
>>>>> I understand your frustration, but that's not enough to go on. I 
>>>>> have
>>>>> never, in like 18 years, had either oligo or affy randomly segfault 
>>>>> on
>>>> me.
>>>>> I understand that it is happening for you, but unless you can come 
>>>>> up
>>>> with
>>>>> a reproducible example, it's not possible for anybody to help.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Is there a way around this so I can ensure that at least I have 
>>>>>> the
>>>> same
>>>>>> setup as they have and I can try to come up with a reproducible
>>>> example to
>>>>>> report this critical bug?
>>>>>> 
>>>>> 
>>>>> Again, I am not sure what to do with that. I am not sure what 'a 
>>>>> way
>>>>> around this' pertains to, and ensuring you have the same setup as 
>>>>> 'they
>>>>> have' seems to be something only you can accomplish. Is there some
>>>> reason
>>>>> you cannot ensure that you have the same setup on two different
>>>> computers?
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Thank you in advance
>>>>>> Joris
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> This email message may contain legally privileged and/or 
>>>>>> confidential
>>>>>> information.  If you are not the intended recipient(s), or the
>>>> employee or
>>>>>> agent responsible for the delivery of this message to the intended
>>>>>> recipient(s), you are hereby notified that any disclosure, 
>>>>>> copying,
>>>>>> distribution, or use of this email message is prohibited.  If you 
>>>>>> have
>>>>>> received this message in error, please notify the sender 
>>>>>> immediately by
>>>>>> e-mail and delete this email message from your computer. Thank 
>>>>>> you.
>>>>>>          [[alternative HTML version deleted]]
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>> <https://maps.google.com/?q=4225+Roosevelt+Way+NE,+%23+100+%0D%0A+Seattle+WA+98105&entry=gmail&source=g>
>>>>> Seattle WA 98105
>>>> <https://maps.google.com/?q=4225+Roosevelt+Way+NE,+%23+100+%0D%0A+Seattle+WA+98105&entry=gmail&source=g>
>>>> -6099
>>>>> 
>>>> 
>>>>          [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> This email message may contain legally privileged 
> and/or...{{dropped:2}}
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list